[RFC/PATCH 2/8 v2] git_remote_helpers: fix input when running under Python 3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Although 2to3 will fix most issues in Python 2 code to make it run under
Python 3, it does not handle the new strict separation between byte
strings and unicode strings.  There is one instance in
git_remote_helpers where we are caught by this, which is when reading
refs from "git for-each-ref".

While we could fix this by explicitly handling refs as byte strings,
this is merely punting the problem to users of the library since the
same problem will be encountered as soon you want to display the ref
name to a user.

Instead of doing this, explicit decode the incoming byte string into a
unicode string.  Following the lead of pygit2 (the Python bindings for
libgit2 - see [1] and [2]), use the filesystem encoding by default,
providing a way for callers to override this if necessary.

[1] https://github.com/libgit2/pygit2/blob/e34911b63e5d2266f9f72a4e3f32e27b13190feb/src/pygit2/reference.c#L261
[2] https://github.com/libgit2/pygit2/blob/e34911b63e5d2266f9f72a4e3f32e27b13190feb/include/pygit2/utils.h#L55

Signed-off-by: John Keeping <john@xxxxxxxxxxxxx>
---

I think this is in fact the best way to handle this, and I hope the
above description clarified why I don't think we want to treat refs as
byte strings in Python 3.

My only remaining question is whether it would be better to set the
error mode when decoding to "replace" instead of "strict" (the default).
"strict" will cause a UnicodeError if the string cannot be decoded
whereas "replace" will use U+FFFD (the replacement character). [3]

I think it's better to use "strict" and let the user know that
something has gone wrong rather than silently change the string, but I'd
welcome other opinions.

[3] http://docs.python.org/2/library/codecs.html#codec-base-classes

 git_remote_helpers/git/importer.py | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/git_remote_helpers/git/importer.py b/git_remote_helpers/git/importer.py
index e28cc8f..5bc16a4 100644
--- a/git_remote_helpers/git/importer.py
+++ b/git_remote_helpers/git/importer.py
@@ -1,5 +1,6 @@
 import os
 import subprocess
+import sys
 
 from git_remote_helpers.util import check_call, check_output
 
@@ -10,17 +11,26 @@ class GitImporter(object):
     This importer simply delegates to git fast-import.
     """
 
-    def __init__(self, repo):
+    def __init__(self, repo, ref_encoding=None):
         """Creates a new importer for the specified repo.
+
+        If ref_encoding is specified that refs are decoded using that
+        encoding.  Otherwise the system filesystem encoding is used.
         """
 
         self.repo = repo
+        self.ref_encoding = ref_encoding
 
     def get_refs(self, gitdir):
         """Returns a dictionary with refs.
         """
         args = ["git", "--git-dir=" + gitdir, "for-each-ref", "refs/heads"]
-        lines = check_output(args).strip().split('\n')
+        encoding = self.ref_encoding
+        if encoding is None:
+            encoding = sys.getfilesystemencoding()
+            if encoding is None:
+                encoding = sys.getdefaultencoding()
+        lines = check_output(args).decode(encoding).strip().split('\n')
         refs = {}
         for line in lines:
             value, name = line.split(' ')
-- 
1.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]