On Tue, Jan 15, 2013 at 12:51:13PM -0800, Junio C Hamano wrote: > John Keeping <john@xxxxxxxxxxxxx> writes: >> Although 2to3 will fix most issues in Python 2 code to make it run under >> Python 3, it does not handle the new strict separation between byte >> strings and unicode strings. There is one instance in >> git_remote_helpers where we are caught by this, which is when reading >> refs from "git for-each-ref". >> >> While we could fix this by explicitly handling refs as byte strings, >> this is merely punting the problem to users of the library since the >> same problem will be encountered as soon you want to display the ref >> name to a user. >> >> Instead of doing this, explicit decode the incoming byte string into a >> unicode string. > > That really feels wrong. Displaying is a separate issue and it is > the _right_ thing to punt the problem at the lower-level machinery > level. But the display will require decoding the ref name to a Unicode string, which depends on the encoding of the underlying ref name, so it feels like it should be decoded where it's read (see [1]). >> Following the lead of pygit2 (the Python bindings for >> libgit2 - see [1] and [2]),... > > I do not think other people getting it wrong is not an excuse to > repeat the same mistake. > > Is it really so cumbersome to handle byte strings as byte strings in > Python? As [1] says, there is a potential for bugs whenever people attempt to combine Unicode and byte strings. I think it also violates the principle of least surprise if a ref name (a string) doesn't behave like a normal string. [1] http://docs.python.org/3.3/howto/unicode.html#tips-for-writing-unicode-aware-programs John -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html