Bug 481 - OS X filename characterset oddness (edit)
:
:
Status: RESOLVED FIXED
:
:
:
Assigned To:
Bugzilla (edit) (take)

(edit)
:
Depends on: (edit)
Blocks: (edit)
  Show dependency treegraph
 
Reported: 2007-01-15 04:57 EST by Helauer
Modified: 2012-05-13 05:00 EDT (History)
7 users (show)

(add)



Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Helauer 2007-01-15 04:57:12 EST
just to note: if a filename contains an international character (in this case
'\xe4') and you try
to update (mercurial 0.9.3) on a MacOS X 10.3 with the included python 2.3
you'll get the appended misleading error message. No clue which part is
responsible for this 'feature'.

Thanks to all programmers of mercurial, I like it very much.
Comment 1 Helauer 2007-01-15 04:58:37 EST
$ hg up

Traceback (most recent call last):
  File "/Users/hlauer/bin/hg", line 12, in ?
    commands.run()
  File "/Users/hlauer/lib/python/mercurial/commands.py", line 3000, in run
    sys.exit(dispatch(sys.argv[1:]))
  File "/Users/hlauer/lib/python/mercurial/commands.py", line 3223, in dispatch
    return d()
  File "/Users/hlauer/lib/python/mercurial/commands.py", line 3182, in <lambda>
    d = lambda: func(u, repo, *args, **cmdoptions)
  File "/Users/hlauer/lib/python/mercurial/commands.py", line 2508, in update
    return hg.update(repo, node)
  File "/Users/hlauer/lib/python/mercurial/hg.py", line 238, in update
    stats = _merge.update(repo, node, False, False, None, None)
  File "/Users/hlauer/lib/python/mercurial/merge.py", line 485, in update
    stats = applyupdates(repo, action, wc, p2)
  File "/Users/hlauer/lib/python/mercurial/merge.py", line 358, in applyupdates
    repo.wwrite(f, t)
  File "/Users/hlauer/lib/python/mercurial/localrepo.py", line 517, in wwrite
    return self.wopener(filename, 'w').write(data)
  File "/Users/hlauer/lib/python/mercurial/util.py", line 1027, in o
    return posixfile(f, mode)
IOError: invalid mode: wb
Comment 2 Benoit Boissinot 2007-01-15 06:41:35 EST
wow, I've never seen file(..) not accepting 'wb' as a mode.

Quoting python documentation:
When opening a binary file, you should append 'b' to the mode value to open the
file in binary mode, which will improve portability. (Appending 'b' is useful
even on systems that don't treat binary and text files differently, where it
serves as documentation.) See below for more possible values of mode.
Comment 3 Helauer 2007-01-15 06:53:54 EST
minimal example, which reproduces the error message. Does't look like
a mercurial problem in the first place...

Python 2.3 (#1, Sep 13 2003, 00:49:11) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> file('/tmp/t\xe4st','w')
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
IOError: invalid mode: w
>>> file('/tmp/test','w')
<open file '/tmp/test', mode 'w' at 0x60320>
Comment 4 Benoit Boissinot 2007-01-15 06:57:48 EST
which filesystem is that ?
Comment 5 Helauer 2007-01-15 07:05:15 EST
File System :   Mac OS Extended (Journaled)
        Connection Bus :        ATA
        IO Content :    Apple_HFS
Comment 6 Thomas Arendsen Hein 2007-03-04 15:12:38 EST
Maybe you can check if other python versions (2.4 or 2.5) solve this?
Comment 7 Kenneth Schmidt 2007-07-14 18:41:47 EDT
I can reproduce this on MacOS 10.4.10, running Python 2.4.4 and 2.5.1 from 
MacPorts, and running Mercurial 0.9.4.

It seems only files with 'odd' characters trigger this bug - if I create a file 
called 'ÆØÅ' on a Linux-machine (with a shell using ISO-8859-1 encoding), and 
add it to the repository, the repo becomes un-update-able on the Mac. Removing 
the file from the repository again fixes this.

I also get 'abort: Invalid Argument: /Users/kenneth/......', raised from 
dirstate.py:461, when trying to do an 'hg status' on something containing 
international characters.

I access my repositories via SSH, from the same Linux-machine mentioned earlier.
Comment 8 Jens Alfke 2007-07-25 22:55:07 EDT
The minimal example (from helauer 2007-01-15) works correctly with the built-in 
Python 2.5.1 on OS X 10.5:

Python 2.5.1 (r251:54863, Jul 21 2007, 02:46:44) 
[GCC 4.0.1 (Apple Inc. build 5464)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> file('/tmp/t\xe4st','w')
<open file '/tmp/t?st', mode 'w' at 0x5c410>

Are Python strings composed of Unicode characters [as in Java or Cocoa] or of 
bytes? If the latter, then you need to be very careful to convert correctly 
between different encodings. Mac filesystem APIs use UTF-8.
Comment 9 Matt Mackall 2007-12-08 01:36:38 EST
fixing up title

From the Py2.5.1 output below, I would expect things to work with newer Python.
Can we double-check that?
Comment 10 Patrick Mézard 2007-12-09 14:47:16 EST
I have no problem with MacOS 10.4.11 and builtin python 2.5.

Supplied example does not work for me but that's expected, my console is UTF-8
and the file name is not a valid UTF-8 sequence. I can create and handle files
with non-latin1 characters.
Comment 11 Helauer 2007-12-10 05:43:31 EST
Oh, so the outcome of my short example depends on the language settings of
the python process environment. Did newer OS X bundled python versions give 
at least a better error message ? 
(something like 'invalid filename' instead of 'invalid file mode' ?)
What's the supposed way of mercurial dealing with different filename encodings
in the repo ? (My report was generated with including a file on linux with
probably iso-latin-1 or -15 encoding and the being unable to pull to OS X)
Comment 12 Matt Mackall 2009-02-15 18:21:30 EST
Assuming fixed by more recent Python.
Comment 13 Bugzilla 2012-05-12 08:39:10 EDT

--- Bug imported by bugzilla@serpentine.com 2012-05-12 08:39 EDT  ---

This bug was previously known as _bug_ 481 at http://mercurial.selenic.com/bts/issue481