9th May, 2008

Opening a file in Python

Filed under: General — admin @ 9:01 pm

I’m sure I read this somewhere recently, but my scratchy memory and command of Google can’t bring it back to me.

Is there a Python idiom for accepting either a file name or a file object as a function parameter?

The closest I can get is this;

def my_function(file_name_or_object):
    try:
        open(file_name_or_object)
    except TypeError:
        file = file_name_or_object
    return file

Any improvements on this are more than welcome.

18 Comments

  1. def my_function(file_name_or_object):
    # parameter is string
    if isinstance(file_name_or_object, (str, unicode)):
    return open(file_name_or_object)

    # parameter is an file like object that has at least a read method
    elif hasattr(file_name_or_object, “read”):
    return file_name_or_object

    Comment by me — 09/05/2008 @ 9:37 pm

  2. def my_function(file_name_or_object):
    # parameter is string
    if isinstance(file_name_or_object, (str, unicode)):
    return open(file_name_or_object)

    # parameter is an file like object that has at least a read method
    elif hasattr(file_name_or_object, “read”):
    return file_name_or_object

    Comment by me — 09/05/2008 @ 9:37 pm

  3. What about this? It would depend of course in how “filelike” the object has to be. I don’t know whether it is considered the most idiomatic, but it’s used for example in xml.etree.Elementree for the parse function.

    if not hasattr(file_name_or_object, ‘read’):
    filename_or_object = open(file_name_or_object)
    return filename_or_object

    Comment by Steven — 09/05/2008 @ 9:51 pm

  4. What about this? It would depend of course in how “filelike” the object has to be. I don’t know whether it is considered the most idiomatic, but it’s used for example in xml.etree.Elementree for the parse function.

    if not hasattr(file_name_or_object, ‘read’):
    filename_or_object = open(file_name_or_object)
    return filename_or_object

    Comment by Steven — 09/05/2008 @ 9:51 pm

  5. @me: why do you use ‘hasattr’ with ‘elif’? Why not simply use ‘isinstance(file_name_or_object,file)’? Is it faster? I’m asking because many classes can have ‘read’ method, not necessarily being ‘file-like’.

    Comment by vArDo — 09/05/2008 @ 10:02 pm

  6. @me: why do you use ‘hasattr’ with ‘elif’? Why not simply use ‘isinstance(file_name_or_object,file)’? Is it faster? I’m asking because many classes can have ‘read’ method, not necessarily being ‘file-like’.

    Comment by vArDo — 09/05/2008 @ 10:02 pm

  7. I agree that in this circumstance a type check ( isinstance(file_name_or_object, basestring) ) is perfectly acceptable. ConfigObj does this. (Particularly as there is no ‘string protocol’ and so any filename will have to be passed in as a real string or at least a subclass.)

    Michael

    Comment by Michael Foord — 09/05/2008 @ 10:02 pm

  8. I agree that in this circumstance a type check ( isinstance(file_name_or_object, basestring) ) is perfectly acceptable. ConfigObj does this. (Particularly as there is no ‘string protocol’ and so any filename will have to be passed in as a real string or at least a subclass.)

    Michael

    Comment by Michael Foord — 09/05/2008 @ 10:02 pm

  9. Isn’t the try/except to be preferred over isinstance/hasattr? Or will it lead to bad logic?
    The OP’s case will assume its a file object if it is not open-able; as opposed to assuming it is a file object if it is an instance of basestring.

    I suspect that in this case it amounts to the same thing if open itself only works if given an instance of basestring, but style-wise I thought it was preferable to limit the use of isinstance/hasattr?

    Hell, I’m nit-picking. either would work!

    - Paddy.

    Comment by Paddy3118 — 09/05/2008 @ 11:55 pm

  10. Isn’t the try/except to be preferred over isinstance/hasattr? Or will it lead to bad logic?
    The OP’s case will assume its a file object if it is not open-able; as opposed to assuming it is a file object if it is an instance of basestring.

    I suspect that in this case it amounts to the same thing if open itself only works if given an instance of basestring, but style-wise I thought it was preferable to limit the use of isinstance/hasattr?

    Hell, I’m nit-picking. either would work!

    - Paddy.

    Comment by Paddy3118 — 09/05/2008 @ 11:55 pm

  11. If the code is only going to read the data in the file a line at a time then I would say it is far better to specify that the function takes an iterable sequence of strings, so that the client code is responsible for creating the file object and passing it in. This way the client could equally give the function a urlib object, or a StringIO object, or a list of strings, or a generator, or anything else that the user can come up with.

    This not only makes the function more flexible and useful, it also makes it much easier to test. I have lots of tests where the production version of the code takes a file object, but the unit tests pass in something like this:

    testdata = ”’
    lines
    of test
    data
    ”’.splitlines()

    result = myFunction(testdata)
    assert result == expected

    This makes the test self-contained, instead of being split across lots of supplementary files.

    Having a function that takes either a string or a filename is a code smell IMHO – if you must allow either then have two separate functions, the first takes a file (or string iterable), and the second takes a filename, opens the file and calls the first function.

    e.g.
    def doStuff(fileObj):
    # do stuff with the file

    def openFileAndDoStuff(filename):
    doStuff(open(filename))

    Comment by Dave Kirby — 10/05/2008 @ 12:09 am

  12. If the code is only going to read the data in the file a line at a time then I would say it is far better to specify that the function takes an iterable sequence of strings, so that the client code is responsible for creating the file object and passing it in. This way the client could equally give the function a urlib object, or a StringIO object, or a list of strings, or a generator, or anything else that the user can come up with.

    This not only makes the function more flexible and useful, it also makes it much easier to test. I have lots of tests where the production version of the code takes a file object, but the unit tests pass in something like this:

    testdata = ”’
    lines
    of test
    data
    ”’.splitlines()

    result = myFunction(testdata)
    assert result == expected

    This makes the test self-contained, instead of being split across lots of supplementary files.

    Having a function that takes either a string or a filename is a code smell IMHO – if you must allow either then have two separate functions, the first takes a file (or string iterable), and the second takes a filename, opens the file and calls the first function.

    e.g.
    def doStuff(fileObj):
    # do stuff with the file

    def openFileAndDoStuff(filename):
    doStuff(open(filename))

    Comment by Dave Kirby — 10/05/2008 @ 12:09 am

  13. You could use named arguments

    def my_function(file_name=None, file_object=None):
    ….if file_object == None:
    ……..file_object = open(file_name’, ‘r’)
    ….# Do stuff with file_object

    my_function(file_name=’test.txt’)
    my_function(file_object=open(‘test.txt’,'r’)

    Comment by Snazz — 10/05/2008 @ 12:43 am

  14. You could use named arguments

    def my_function(file_name=None, file_object=None):
    ….if file_object == None:
    ……..file_object = open(file_name’, ‘r’)
    ….# Do stuff with file_object

    my_function(file_name=’test.txt’)
    my_function(file_object=open(‘test.txt’,'r’)

    Comment by Snazz — 10/05/2008 @ 12:43 am

  15. Another version:

    def getfile(obj):
    ….if all(hasattr(obj, attr) for attr in ['read', 'seek', 'write']):
    ……..return obj
    ….elif isinstance(obj, basestring) and os.path.isfile(obj):
    ……..return file(obj)
    ….else:
    ……..raise ValueError(“Not a file-like object or valid path.”)

    Comment by Steve Kryskalla — 10/05/2008 @ 3:12 am

  16. Another version:

    def getfile(obj):
    ….if all(hasattr(obj, attr) for attr in ['read', 'seek', 'write']):
    ……..return obj
    ….elif isinstance(obj, basestring) and os.path.isfile(obj):
    ……..return file(obj)
    ….else:
    ……..raise ValueError(“Not a file-like object or valid path.”)

    Comment by Steve Kryskalla — 10/05/2008 @ 3:12 am

  17. I think you could also turn this type of function into a decorator that will automatically coerce an argument of the wrapped function.

    Comment by Steve Kryskalla — 10/05/2008 @ 3:16 am

  18. I think you could also turn this type of function into a decorator that will automatically coerce an argument of the wrapped function.

    Comment by Steve Kryskalla — 10/05/2008 @ 3:16 am

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress