Bugzilla – Bug 481
OS X filename characterset oddness
Last modified: 2012-05-13 05:00:53 EDT
just to note: if a filename contains an international character (in this case '\xe4') and you try to update (mercurial 0.9.3) on a MacOS X 10.3 with the included python 2.3 you'll get the appended misleading error message. No clue which part is responsible for this 'feature'. Thanks to all programmers of mercurial, I like it very much.
$ hg up Traceback (most recent call last): File "/Users/hlauer/bin/hg", line 12, in ? commands.run() File "/Users/hlauer/lib/python/mercurial/commands.py", line 3000, in run sys.exit(dispatch(sys.argv[1:])) File "/Users/hlauer/lib/python/mercurial/commands.py", line 3223, in dispatch return d() File "/Users/hlauer/lib/python/mercurial/commands.py", line 3182, in <lambda> d = lambda: func(u, repo, *args, **cmdoptions) File "/Users/hlauer/lib/python/mercurial/commands.py", line 2508, in update return hg.update(repo, node) File "/Users/hlauer/lib/python/mercurial/hg.py", line 238, in update stats = _merge.update(repo, node, False, False, None, None) File "/Users/hlauer/lib/python/mercurial/merge.py", line 485, in update stats = applyupdates(repo, action, wc, p2) File "/Users/hlauer/lib/python/mercurial/merge.py", line 358, in applyupdates repo.wwrite(f, t) File "/Users/hlauer/lib/python/mercurial/localrepo.py", line 517, in wwrite return self.wopener(filename, 'w').write(data) File "/Users/hlauer/lib/python/mercurial/util.py", line 1027, in o return posixfile(f, mode) IOError: invalid mode: wb
wow, I've never seen file(..) not accepting 'wb' as a mode. Quoting python documentation: When opening a binary file, you should append 'b' to the mode value to open the file in binary mode, which will improve portability. (Appending 'b' is useful even on systems that don't treat binary and text files differently, where it serves as documentation.) See below for more possible values of mode.
minimal example, which reproduces the error message. Does't look like a mercurial problem in the first place... Python 2.3 (#1, Sep 13 2003, 00:49:11) [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> file('/tmp/t\xe4st','w') Traceback (most recent call last): File "<stdin>", line 1, in ? IOError: invalid mode: w >>> file('/tmp/test','w') <open file '/tmp/test', mode 'w' at 0x60320>
which filesystem is that ?
File System : Mac OS Extended (Journaled) Connection Bus : ATA IO Content : Apple_HFS
Maybe you can check if other python versions (2.4 or 2.5) solve this?
I can reproduce this on MacOS 10.4.10, running Python 2.4.4 and 2.5.1 from MacPorts, and running Mercurial 0.9.4. It seems only files with 'odd' characters trigger this bug - if I create a file called 'ÆØÅ' on a Linux-machine (with a shell using ISO-8859-1 encoding), and add it to the repository, the repo becomes un-update-able on the Mac. Removing the file from the repository again fixes this. I also get 'abort: Invalid Argument: /Users/kenneth/......', raised from dirstate.py:461, when trying to do an 'hg status' on something containing international characters. I access my repositories via SSH, from the same Linux-machine mentioned earlier.
The minimal example (from helauer 2007-01-15) works correctly with the built-in Python 2.5.1 on OS X 10.5: Python 2.5.1 (r251:54863, Jul 21 2007, 02:46:44) [GCC 4.0.1 (Apple Inc. build 5464)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> file('/tmp/t\xe4st','w') <open file '/tmp/t?st', mode 'w' at 0x5c410> Are Python strings composed of Unicode characters [as in Java or Cocoa] or of bytes? If the latter, then you need to be very careful to convert correctly between different encodings. Mac filesystem APIs use UTF-8.
fixing up title From the Py2.5.1 output below, I would expect things to work with newer Python. Can we double-check that?
I have no problem with MacOS 10.4.11 and builtin python 2.5. Supplied example does not work for me but that's expected, my console is UTF-8 and the file name is not a valid UTF-8 sequence. I can create and handle files with non-latin1 characters.
Oh, so the outcome of my short example depends on the language settings of the python process environment. Did newer OS X bundled python versions give at least a better error message ? (something like 'invalid filename' instead of 'invalid file mode' ?) What's the supposed way of mercurial dealing with different filename encodings in the repo ? (My report was generated with including a file on linux with probably iso-latin-1 or -15 encoding and the being unable to pull to OS X)
Assuming fixed by more recent Python.
--- Bug imported by bugzilla@serpentine.com 2012-05-12 08:39 EDT --- This bug was previously known as _bug_ 481 at http://mercurial.selenic.com/bts/issue481