Andrew Channels Dexter Pinion: A little light munging

October 03, 2002

A little light munging

I love my new Archos Jukebox. But it was only after putting about a hundred CDs worth of music on it that I found a drawback.

By default Musicmatch Jukebox, the MP3 writing software supplied with the unit, arranges tracks in a directory per album, and with albums grouped in directories by artist. The file name of each individual track is then simply the track title. Simple, really. When you play these tracks in Musicmatch jukebox you can select them in track order but when you play them on your Archos jukebox the tracks are played in alphabetical filename order. Which is fine, but does ruin the enjoyment of a quality concept album like Misplaced Childhood.

Rather than change the settings in Musicmatch and then re-record all one hundred plus CDs I decided on a radical plan. I would prepend the track number to the beginning of each filename on my Jukebox. So instead of "Lavender.mp3" the file would be called "03_Lavender.mp3". A task of mere minutes I thought, because I have Python at my command. In the end it was quite a simple task, but the process of writing that simple solution was a bit of a voyage of discovery which I thought I would share. Luckily for our purposes each MP3 file has some meta data encoded in it by the burning software. Looking at chapter three of Dive into Python there is an example of how to read the ID3v1.0 style tags in your MP3 files. After borrowing this code I realised the flaw in my cunning plan, there is no track number in the ID3v1.0 tag. A little investigation (at id3.org determined that I should be looking at the ID3v2.0 tag in my files.

Sure enough, with a little experimentation and minimal cursing I found out how to get the relevant parts of the ID3 tag, namely the track title and track number. Its as simple as;

def getTagData(directoryName, fileName):
    "Return track number and title from ID3v2 tag of fileName"
    file = open(os.path.join(directoryName, fileName))
    tagHeader = file.read(1024) # If its a large header this won't be enough
    file.close()
    # Get the track number
    numberPosition = tagHeader.find("TRCK")
    if numberPosition:
        start = numberPosition + 11
        end = numberPosition + 13
        trackNo = tagHeader[start:end]
    if trackNo[1] == "T":
        trackNo = trackNo[0]
    try:
        trackNumber = int(trackNo)
    except:
        trackNumber = 0 # Nice default
    # Get the track title to a maximum of 256 characters
    filenamePosition = tagHeader.find("TIT2")
    if filenamePosition:
        start = filenamePosition + 10
        startLength = filenamePosition + 4
        endLength = filenamePosition + 8
        length = struct.unpack('bbbb', tagHeader[startLength:endLength])
        end = start + length[3] # Only need other components if file name is more than 256 characters
        trackName = tagHeader[start:end].replace("\00", " ").strip()
    # All done, return to our calling function
    return trackNumber, trackName

Easy really. All you really need to know is that ID3v2.0 allows you to put as many (or as few) tags within your tag (what they call frames). Each frame is identified by a name, the two we are interested in here are "TRCK" for track number and "TIT2" for track name. Rather than fiendishly slice up the up the entire tag I just asssume these frames are in the first 1024 bytes of the track and then search for those strings. What immediately follows the frame identified varys from frame to frame, but you should be able to infer the details of the two frames we are interested in from the preceding code. If not, have a look at the website.

Having mastered the ability to read the tags and garner the information needed to rename each file I had to perform the change. In Python, this is a cinch;

def rename(directoryName, fileName):
    trackNumber, trackName = getTagData(directoryName, fileName)
    if trackNumber == 0:
        return # We haven't picked up the tag information
    if trackNumber < 10:
        prefix = '0'+str(trackNumber)
    else:
        prefix = str(trackNumber)
    modifiedFileName = prefix + '_' + trackName + '.mp3'
    if modifiedFileName != fileName:
        print "New : %s\nOld : %s" % ( modifiedFileName, fileName )
        os.rename(os.path.join(directoryName, fileName), os.path.join(directoryName, modifiedFileName))

With these two building blocks its a simple matter to go through my entire MP3 collection, well almost. First we have to find them all. If you remember from the top of this piece my collection is organised into a hierarchy of directories.

The Python library comes to our aid here. The os.path module has a function called walk which, given a starting point and a function, calls the function for every directory it finds under the starting point. So the last part of my script is to write a function that should be called for every directory in my MP3 collection. Something like;

def processDir(arg, dirname, names):
    for file in names:
        if os.path.splitext(file)[1] in arg:
            rename(dirname, file)

You can see that I've employed the optional third argument to a function called from os.path.walk, a list of arguments which in this case are file extensions so that we only rename MP3 files. This is then called as follows;

if __name__ == "__main__":
os.path.walk(songDirectory, processDir, [".mp3"])

Done. Now, why did I post this here? Well, this is the most useful code I've written in a while and I thought it might be of interest to document my development process. Not least because someone might stumble across this humble weblog and tell me how to do it better, or more efficiently, or even more object-oriented-ly.

Of course, a little success has got me thinking. My next personal project will be a random play list generator. I've already got a first cut of pseudo-code;

class archosPlaylist:
    def __init__(self, length=20):
        self.trackCount = length
        self.tracks = []
    def getTracks(self);
        for index in range(self.trackCount)
            # select a song
            # add it to the playlist
    def showTracks(self):
        for track in self.tracks[]
            print track.name
    def write(self, filename):
        # output self.tracks to filename

Now all I have to do is write the program.

Posted by Andy Todd at October 03, 2002 04:15 PM

Comments

Andy, I have been trying for *days* to get this archos to work with Linux, and Debian in specific.
I have an archos jukebox studio 10.
Short version: can you outline what kernel selections you needed and what packages you installed to get it to work.

Long Version:I have tried 2.5.45, 2.5.54, 2.4.20-k7 ( A debian kernel ).
when I try to load the usb-storage module, I get unresolved symbols (at best). At worst, I get absolutely nothing, no errors, like the command never was typed, 0.
I've been told I do need the ide/ATA support, and that I do not need ata/ide support.
I have hotplug package installed, and all I get is "can't synthesize PCI events" and then nothing, no daemon, zilch. As far as syslog is concerned, the archos dosen't exist, but I finally got it to admit that there is a USB onboard. really, I'm just scratching the surface of what I have done to get this damn thing to work. Can you please give me any tips/suggestions you may have? I'm on openprojects.net in both #debian and #rockbox ( the (more or less) archos jukebox channel).
nick = wethion & wethion_ respectively.

Thanks.
Vale

Posted by: Vale on January 7, 2003 10:33 PM

Andrew Channels Dexter Pinion

October 03, 2002

A little light munging

Advertisements