John Keeping <john@xxxxxxxxxxxxx> writes: > When this change was originally made (0846b0c - git-remote-testpy: hash > bytes explicitly , I didn't realised that the "hex" encoding we chose is > a "bytes to bytes" encoding so it just fails with an error on Python 3 > in the same way as the original code. > > It is not possible to provide a single code path that works on Python 2 > and Python 3 since Python 2.x will attempt to decode the string before > encoding it, which fails for strings that are not valid in the default > encoding. Python 3.1 introduced the "surrogateescape" error handler > which handles this correctly and permits a bytes -> unicode -> bytes > round-trip to be lossless. > > At this point Python 3.0 is unsupported so we don't go out of our way to > try to support it. > > Helped-by: Michael Haggerty <mhagger@xxxxxxxxxxxx> > Signed-off-by: John Keeping <john@xxxxxxxxxxxxx> > --- Thanks; will queue and wait for an Ack from Michael. Does the helper function need to be named with leading underscore, though? > On Sun, Jan 27, 2013 at 02:13:29PM +0000, John Keeping wrote: >> On Sun, Jan 27, 2013 at 05:44:37AM +0100, Michael Haggerty wrote: >> > So to handle all of the cases across Python versions as closely as >> > possible to the old 2.x code, it might be necessary to make the code >> > explicitly depend on the Python version number, like: >> > >> > hasher = _digest() >> > if sys.hexversion < 0x03000000: >> > pathbytes = repo.path >> > elif sys.hexversion < 0x03010000: >> > # If support for Python 3.0.x is desired (note: result can >> > # be different in this case than under 2.x or 3.1+): >> > pathbytes = repo.path.encode(sys.getfilesystemencoding(), >> > 'backslashreplace') >> > else >> > pathbytes = repo.path.encode(sys.getfilesystemencoding(), >> > 'surrogateescape') >> > hasher.update(pathbytes) >> > repo.hash = hasher.hexdigest() > > How about this? > > git-remote-testpy.py | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/git-remote-testpy.py b/git-remote-testpy.py > index c7a04ec..16b0c52 100644 > --- a/git-remote-testpy.py > +++ b/git-remote-testpy.py > @@ -36,6 +36,22 @@ if sys.hexversion < 0x02000000: > sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n") > sys.exit(1) > > + > +def _encode_filepath(path): > + """Encodes a Unicode file path to a byte string. > + > + On Python 2 this is a no-op; on Python 3 we encode the string as > + suggested by [1] which allows an exact round-trip from the command line > + to the filesystem. > + > + [1] http://docs.python.org/3/c-api/unicode.html#file-system-encoding > + > + """ > + if sys.hexversion < 0x03000000: > + return path > + return path.encode('utf-8', 'surrogateescape') > + > + > def get_repo(alias, url): > """Returns a git repository object initialized for usage. > """ > @@ -45,7 +61,7 @@ def get_repo(alias, url): > repo.get_head() > > hasher = _digest() > - hasher.update(repo.path.encode('hex')) > + hasher.update(_encode_filepath(repo.path)) > repo.hash = hasher.hexdigest() > > repo.get_base_path = lambda base: os.path.join( -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html