October 03, 2002
A little light munging
By default Musicmatch Jukebox, the MP3 writing software supplied with the unit, arranges tracks in a directory per album, and with albums grouped in directories by artist. The file name of each individual track is then simply the track title. Simple, really. When you play these tracks in Musicmatch jukebox you can select them in track order but when you play them on your Archos jukebox the tracks are played in alphabetical filename order. Which is fine, but does ruin the enjoyment of a quality concept album like Misplaced Childhood.
Rather than change the settings in Musicmatch and then re-record all one hundred plus CDs I decided on a radical plan. I would prepend the track number to the beginning of each filename on my Jukebox. So instead of "Lavender.mp3" the file would be called "03_Lavender.mp3". A task of mere minutes I thought, because I have Python at my command. In the end it was quite a simple task, but the process of writing that simple solution was a bit of a voyage of discovery which I thought I would share. Luckily for our purposes each MP3 file has some meta data encoded in it by the burning software. Looking at chapter three of Dive into Python there is an example of how to read the ID3v1.0 style tags in your MP3 files. After borrowing this code I realised the flaw in my cunning plan, there is no track number in the ID3v1.0 tag. A little investigation (at id3.org determined that I should be looking at the ID3v2.0 tag in my files.
Sure enough, with a little experimentation and minimal cursing I found out how to get the relevant parts of the ID3 tag, namely the track title and track number. Its as simple as;
def getTagData(directoryName, fileName):
"Return track number and title from ID3v2 tag of fileName"
file = open(os.path.join(directoryName, fileName))
tagHeader = file.read(1024) # If its a large header this won't be enough
# Get the track number
numberPosition = tagHeader.find("TRCK")
start = numberPosition + 11
end = numberPosition + 13
trackNo = tagHeader[start:end]
if trackNo == "T":
trackNo = trackNo
trackNumber = int(trackNo)
trackNumber = 0 # Nice default
# Get the track title to a maximum of 256 characters
filenamePosition = tagHeader.find("TIT2")
start = filenamePosition + 10
startLength = filenamePosition + 4
endLength = filenamePosition + 8
length = struct.unpack('bbbb', tagHeader[startLength:endLength])
end = start + length # Only need other components if file name is more than 256 characters
trackName = tagHeader[start:end].replace("\00", " ").strip()
# All done, return to our calling function
return trackNumber, trackName
Easy really. All you really need to know is that ID3v2.0 allows you to put as many (or as few) tags within your tag (what they call frames). Each frame is identified by a name, the two we are interested in here are "TRCK" for track number and "TIT2" for track name. Rather than fiendishly slice up the up the entire tag I just asssume these frames are in the first 1024 bytes of the track and then search for those strings. What immediately follows the frame identified varys from frame to frame, but you should be able to infer the details of the two frames we are interested in from the preceding code. If not, have a look at the website.
Having mastered the ability to read the tags and garner the information needed to rename each file I had to perform the change. In Python, this is a cinch;
def rename(directoryName, fileName):
trackNumber, trackName = getTagData(directoryName, fileName)
if trackNumber == 0:
return # We haven't picked up the tag information
if trackNumber < 10:
prefix = '0'+str(trackNumber)
prefix = str(trackNumber)
modifiedFileName = prefix + '_' + trackName + '.mp3'
if modifiedFileName != fileName:
print "New : %s\nOld : %s" % ( modifiedFileName, fileName )
os.rename(os.path.join(directoryName, fileName), os.path.join(directoryName, modifiedFileName))
With these two building blocks its a simple matter to go through my entire MP3 collection, well almost. First we have to find them all. If you remember from the top of this piece my collection is organised into a hierarchy of directories.
The Python library comes to our aid here. The os.path module has a function called walk which, given a starting point and a function, calls the function for every directory it finds under the starting point. So the last part of my script is to write a function that should be called for every directory in my MP3 collection. Something like;
def processDir(arg, dirname, names):
for file in names:
if os.path.splitext(file) in arg:
You can see that I've employed the optional third argument to a function called from os.path.walk, a list of arguments which in this case are file extensions so that we only rename MP3 files. This is then called as follows;
if __name__ == "__main__":
os.path.walk(songDirectory, processDir, [".mp3"])
Done. Now, why did I post this here? Well, this is the most useful code I've written in a while and I thought it might be of interest to document my development process. Not least because someone might stumble across this humble weblog and tell me how to do it better, or more efficiently, or even more object-oriented-ly.
Of course, a little success has got me thinking. My next personal project will be a random play list generator. I've already got a first cut of pseudo-code;
def __init__(self, length=20):
self.trackCount = length
self.tracks = 
for index in range(self.trackCount)
# select a song
# add it to the playlist
for track in self.tracks
def write(self, filename):
# output self.tracks to filename
Now all I have to do is write the program.
Posted by Andy Todd at October 03, 2002 04:15 PM