On Sun, Jan 13, 2013 at 12:41:30AM +0000, John Keeping wrote: > On Sat, Jan 12, 2013 at 06:43:04PM -0500, Pete Wyckoff wrote: >> Can you give me some hints about the byte/unicode string issues >> in git-p4.py? There's really only one place that does: >> >> p4 = subprocess.Popen("p4 -G ...") >> marshal.load(p4.stdout) >> >> If that's the only issue, this might not be too paniful. > > The problem is that what gets loaded there is a dictionary (encoded by > p4) that maps byte strings to byte strings, so all of the accesses to > that dictionary need to either: > > 1) explicitly call encode() on a string constant > or 2) use a byte string constant with a "b" prefix > > Or we could re-write the dictionary once, which handles the keys... but > some of the values are also used as strings and we can't handle that as > a one-off conversion since in other places we really do want the byte > string (think content of binary files). > > Basically a thorough audit of all access to variables that come from p4 > would be needed, with explicit decode()s for authors, dates, etc. Having thought about this a bit more, another possibility would be to apply this transformation once using something like this (completely untested, I haven't looked up the keys of interest): -- >8 -- def _noop(s): return s def _decode(s): return s.decode('utf-8') CONVERSION_MAP = { 'user': _decode, 'data': _decode } d = marshal.load(p4.stdout) retval = {} for k, v in d.items(): key = k.decode('utf-8') retval[key] = CONVERSION_MAP.get(key, _noop)(v) return retval -- 8< -- Obviously this isn't ideal but without p4 gaining a Python 3 output mode I suspect this would be the best we could do. John -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html