September 05, 2005
Checking my file copies
I spent a fair part of the weekend rearranging the digital photo files on my hard drive. I moved from a single folder to a nice nested structure which organises the pictures by date. Before I delete the files from the original folder I wanted to ensure that I'd copied every one. So I wrote some code. Which I now present with a small request, how could I have done this better?
By which I mean, more elegantly, efficiently or more readably, with the only constraint being that I'm not going to change the language it's written in. Answers in the comments please.
def check(directoryA, directoryB):
"Check that every file in directoryA is present somewhere under directoryB"
filelist = Set()
for dirpath, dirname, filenames in os.walk(directoryB):
for file in filenames:
filelist.add(file)
missing = []
for file in os.listdir(directoryA):
# find file under directoryB
if not file.startswith('.') and file not in filelist:
# print "%s is missing" % file
missing.append(file)
return missing
Posted by Andy Todd at September 05, 2005 07:17 PM
Sort of more elegant and sort of not: this is a bit untested but it should work.
def check(dirA,dirB):
nested_files = [files for dp,dn,files in os.walk(dirB)
files_in_B = [x for subseq in nested_files for x in subseq] # flatten list
missing = [file for file in os.listdir(dirA) if file not in files_in_B]
return missing
(ah, oops, put the ] on the end of the "nested_files =" line)
Posted by: Stuart Langridge on September 5, 2005 08:42 PMMight it be worth checking that the files have the correct content? It'll take a *lot* longer, naturally, but might it be worth comparing MD5 hashes?
Posted by: Simon Brunning on September 5, 2005 09:09 PMOK Stuart, can you explain what on earth
[x for subseq in nested_files for x in subseq]
does? Because it makes me screw up my eyes and squint at it.
Posted by: Andy Todd on September 5, 2005 11:00 PMOh hang on, reading the manual [1] helps.
It's the equivalent of;
for subseq in nested_files:
for x in subseq:
.. do stuff ..
[1] http://www.python.org/peps/pep-0202.html
Posted by: Andy Todd on September 5, 2005 11:06 PMAndy: yeah, it's a nested list comprehension. I only found out about them about three days ago when I googled for "python flatten list" :)
Posted by: Stuart Langridge on September 6, 2005 05:48 PMHaven't played with it but this could take the strain: http://docs.python.org/lib/module-filecmp.html
Posted by: AndyB on September 7, 2005 03:05 AMWhy not really use sets:
# for Python 2.4
# from sets import Set as set # uncomment for 2.3
def check(directoryA, directoryB, verbose=True):
"Check that every file in directoryA is present somewhere under directoryB"
originals = set(os.listdir(directoryA))
copies = set()
for dirpath, dirname, filenames in os.walk(directoryB):
copies |= set(filenames)
missing = originals - copies
if verbose:
for name in missing:
if not name.startswith('.'):
print "%s is missing" % name
return missing