Re: [PATCH] git-remote-testpy: fix patch hashing on Python 3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



John Keeping <john@xxxxxxxxxxxxx> writes:

> When this change was originally made (0846b0c - git-remote-testpy: hash
> bytes explicitly , I didn't realised that the "hex" encoding we chose is
> a "bytes to bytes" encoding so it just fails with an error on Python 3
> in the same way as the original code.
>
> It is not possible to provide a single code path that works on Python 2
> and Python 3 since Python 2.x will attempt to decode the string before
> encoding it, which fails for strings that are not valid in the default
> encoding.  Python 3.1 introduced the "surrogateescape" error handler
> which handles this correctly and permits a bytes -> unicode -> bytes
> round-trip to be lossless.
>
> At this point Python 3.0 is unsupported so we don't go out of our way to
> try to support it.
>
> Helped-by: Michael Haggerty <mhagger@xxxxxxxxxxxx>
> Signed-off-by: John Keeping <john@xxxxxxxxxxxxx>
> ---

Thanks; will queue and wait for an Ack from Michael.

Does the helper function need to be named with leading underscore,
though?

> On Sun, Jan 27, 2013 at 02:13:29PM +0000, John Keeping wrote:
>> On Sun, Jan 27, 2013 at 05:44:37AM +0100, Michael Haggerty wrote:
>> > So to handle all of the cases across Python versions as closely as
>> > possible to the old 2.x code, it might be necessary to make the code
>> > explicitly depend on the Python version number, like:
>> > 
>> >     hasher = _digest()
>> >     if sys.hexversion < 0x03000000:
>> >         pathbytes = repo.path
>> >     elif sys.hexversion < 0x03010000:
>> >         # If support for Python 3.0.x is desired (note: result can
>> >         # be different in this case than under 2.x or 3.1+):
>> >         pathbytes = repo.path.encode(sys.getfilesystemencoding(),
>> > 'backslashreplace')
>> >     else
>> >         pathbytes = repo.path.encode(sys.getfilesystemencoding(),
>> > 'surrogateescape')
>> >     hasher.update(pathbytes)
>> >     repo.hash = hasher.hexdigest()
>
> How about this?
>
>  git-remote-testpy.py | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/git-remote-testpy.py b/git-remote-testpy.py
> index c7a04ec..16b0c52 100644
> --- a/git-remote-testpy.py
> +++ b/git-remote-testpy.py
> @@ -36,6 +36,22 @@ if sys.hexversion < 0x02000000:
>      sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n")
>      sys.exit(1)
>  
> +
> +def _encode_filepath(path):
> +    """Encodes a Unicode file path to a byte string.
> +
> +    On Python 2 this is a no-op; on Python 3 we encode the string as
> +    suggested by [1] which allows an exact round-trip from the command line
> +    to the filesystem.
> +
> +    [1] http://docs.python.org/3/c-api/unicode.html#file-system-encoding
> +
> +    """
> +    if sys.hexversion < 0x03000000:
> +        return path
> +    return path.encode('utf-8', 'surrogateescape')
> +
> +
>  def get_repo(alias, url):
>      """Returns a git repository object initialized for usage.
>      """
> @@ -45,7 +61,7 @@ def get_repo(alias, url):
>      repo.get_head()
>  
>      hasher = _digest()
> -    hasher.update(repo.path.encode('hex'))
> +    hasher.update(_encode_filepath(repo.path))
>      repo.hash = hasher.hexdigest()
>  
>      repo.get_base_path = lambda base: os.path.join(
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]