Re: [RFC/PATCH 2/8 v2] git_remote_helpers: fix input when running under Python 3

John Keeping <john@xxxxxxxxxxxxx> · Tue, 15 Jan 2013 21:54:12 +0000

On Tue, Jan 15, 2013 at 12:51:13PM -0800, Junio C Hamano wrote:
> John Keeping <john@xxxxxxxxxxxxx> writes:
>> Although 2to3 will fix most issues in Python 2 code to make it run under
>> Python 3, it does not handle the new strict separation between byte
>> strings and unicode strings.  There is one instance in
>> git_remote_helpers where we are caught by this, which is when reading
>> refs from "git for-each-ref".
>>
>> While we could fix this by explicitly handling refs as byte strings,
>> this is merely punting the problem to users of the library since the
>> same problem will be encountered as soon you want to display the ref
>> name to a user.
>>
>> Instead of doing this, explicit decode the incoming byte string into a
>> unicode string.
> 
> That really feels wrong.  Displaying is a separate issue and it is
> the _right_ thing to punt the problem at the lower-level machinery
> level.

But the display will require decoding the ref name to a Unicode string,
which depends on the encoding of the underlying ref name, so it feels
like it should be decoded where it's read (see [1]).

>> Following the lead of pygit2 (the Python bindings for
>> libgit2 - see [1] and [2]),...
> 
> I do not think other people getting it wrong is not an excuse to
> repeat the same mistake.
>
> Is it really so cumbersome to handle byte strings as byte strings in
> Python?

As [1] says, there is a potential for bugs whenever people attempt to
combine Unicode and byte strings.  I think it also violates the
principle of least surprise if a ref name (a string) doesn't behave like
a normal string.

[1] http://docs.python.org/3.3/howto/unicode.html#tips-for-writing-unicode-aware-programs

John
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html