"Ben Keene via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Ben Keene <bkeene@xxxxxxxxxxxxxx> > Subject: Re: [PATCH v2 1/3] Cast byte strings to unicode strings in python3 That is "how" this patch tries to achieve something that is not explained in this proposed log message. Worse, the rest of the proposed log message is entirely empty and does not help readers understand what the author wanted to do, or decide if applying these patches is a good idea. You wrote something about Python 3 having issues while comparing the strings read out of process.popen() against string literals in the code elsewhere in the cover letter. That information is probably very relevant to explain why this commit exists, i.e. it should be explained here. Here is what I _think_ is close to why you wrote this patch, but as I already said, I am far from being familiar with Py2to3 differences, so there may be factual misunderstanding you would want to fix, even if you decide to pick some pieces out of it when sending an updated patch. Subject: [PATCH] git-p4: compare "p4" output with the right string type Communicating with a subprocess yields a byte string in both Python 2 and Python 3. Comparing the output or the error string with string literals makes the code fail, as string literals in Python 3 are unicode strings and not byte strings. Introduce a ustring() helper, which returns the corresponding unicode string for a given byte string (or return the argument as-is, when a unicode string is given) in Python 3, but returns the given bytestring as-is in Python 2. Use it to turn strings obtained from subprocess to the type suitable to compare with the string literals in both versions of Python. What does the name "ustring()" stand for? What purpose does the function serve at the higher conceptual level? Name the function after the answer to that question and the code would become easier to explain. It certainly is not "make sure we have unicode string", as the name is shared between Python 2 and Python 3, and in the older Python 2 world, you want the helper to mean "keep the string a bytestring", not turning it into a unicode string. So I doubt ustring() is a great name for this helper. > if skip_info: > if 'code' in entry and entry['code'] == 'info': > continue > + if b'code' in entry and entry[b'code'] == b'info': > + continue This one explicitly calls for 'bytes', which I think would work correctly with Python 2. But why would we have to prepare for both variants existing in the entry[] hash (especially when working with Python3)? Shouldn't the code that populates entry[] be responsible for turning a byte string into a unicode string? Also, is subprocess.communicate the only way bytestrings can come to "git-p4" program, or are there other avenues? What I am wondering is if it is sensible to declare that when running under Python 3, the program will internally use unicode strings for everything and convert any bytestring to unicode string as soon as it enters [*1*] That would mean ... > if cb is not None: > cb(entry) > else: > - result.append(entry) > + if isunicode: > + out = {} > + for key, value in entry.items(): > + out[ustring(key)] = ustring(value) > + result.append(out) > + else: > + result.append(entry) ... a hunk like this would become entirely unnecessary, as keys and values of entry[] that are read from outside world would have already been made into unicode strings under Python 3. By the way, would values in entry[] always some string (iow, can they have a literal number like 47 and 3.14)? > except EOFError: > pass > exitCode = p4.wait() > @@ -792,7 +813,7 @@ def gitConfig(key, typeSpecifier=None): > cmd += [ key ] > s = read_pipe(cmd, ignore_error=True) > _gitConfig[key] = s.strip() > - return _gitConfig[key] > + return ustring(_gitConfig[key]) Likewise. I'd expect, if we were to declare that our internal strings are all unicode in Python 3, we'd be using a thin wrapper of read_pipe() that yields a unicode string, so 's' would not be a bytestring, and s.strip() would not be either. > def gitConfigBool(key): > """Return a bool, using git config --bool. It is True only if the > @@ -860,6 +881,7 @@ def branch_exists(branch): > cmd = [ "git", "rev-parse", "--symbolic", "--verify", branch ] > p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) > out, _ = p.communicate() > + out = ustring(out) But perhaps it is a bit too much and it is easier to call the conversion helper on the results obtained by read_pipe() that gives us bytestring, like this one (and one above that) does? I dunno. Thanks. [Footnote] *1* If it is not just string comparison but the general direction were to consistently use unicode strings under Python 3, which I think makes sense at the conceptual level, the sample log message I prepared in the earlier part of this response needs to be updated, so that it is not so message/comparison centric. I said "at the conceptual level", because there may be practical reasons not to use unicode everywhere (e.g. performance might degrade and some bytestrings used may not be text data in the first place).