Although 2to3 will fix most issues in Python 2 code to make it run under Python 3, it does not handle the new strict separation between byte strings and unicode strings. There is one instance in git_remote_helpers where we are caught by this, which is when reading refs from "git for-each-ref". While we could fix this by explicitly handling refs as byte strings, this is merely punting the problem to users of the library since the same problem will be encountered as soon you want to display the ref name to a user. Instead of doing this, explicit decode the incoming byte string into a unicode string. Following the lead of pygit2 (the Python bindings for libgit2 - see [1] and [2]), use the filesystem encoding by default, providing a way for callers to override this if necessary. [1] https://github.com/libgit2/pygit2/blob/e34911b63e5d2266f9f72a4e3f32e27b13190feb/src/pygit2/reference.c#L261 [2] https://github.com/libgit2/pygit2/blob/e34911b63e5d2266f9f72a4e3f32e27b13190feb/include/pygit2/utils.h#L55 Signed-off-by: John Keeping <john@xxxxxxxxxxxxx> --- I think this is in fact the best way to handle this, and I hope the above description clarified why I don't think we want to treat refs as byte strings in Python 3. My only remaining question is whether it would be better to set the error mode when decoding to "replace" instead of "strict" (the default). "strict" will cause a UnicodeError if the string cannot be decoded whereas "replace" will use U+FFFD (the replacement character). [3] I think it's better to use "strict" and let the user know that something has gone wrong rather than silently change the string, but I'd welcome other opinions. [3] http://docs.python.org/2/library/codecs.html#codec-base-classes git_remote_helpers/git/importer.py | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/git_remote_helpers/git/importer.py b/git_remote_helpers/git/importer.py index e28cc8f..5bc16a4 100644 --- a/git_remote_helpers/git/importer.py +++ b/git_remote_helpers/git/importer.py @@ -1,5 +1,6 @@ import os import subprocess +import sys from git_remote_helpers.util import check_call, check_output @@ -10,17 +11,26 @@ class GitImporter(object): This importer simply delegates to git fast-import. """ - def __init__(self, repo): + def __init__(self, repo, ref_encoding=None): """Creates a new importer for the specified repo. + + If ref_encoding is specified that refs are decoded using that + encoding. Otherwise the system filesystem encoding is used. """ self.repo = repo + self.ref_encoding = ref_encoding def get_refs(self, gitdir): """Returns a dictionary with refs. """ args = ["git", "--git-dir=" + gitdir, "for-each-ref", "refs/heads"] - lines = check_output(args).strip().split('\n') + encoding = self.ref_encoding + if encoding is None: + encoding = sys.getfilesystemencoding() + if encoding is None: + encoding = sys.getdefaultencoding() + lines = check_output(args).decode(encoding).strip().split('\n') refs = {} for line in lines: value, name = line.split(' ') -- 1.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html