Re: [PATCH 2/2] git-p4: do not decode data from perforce by default

Luke Diamand <luke@xxxxxxxxxxx> · Fri, 30 Apr 2021 15:33:11 +0000

On 30/04/2021 08:53, Andrew Oakley wrote:
On Thu, 29 Apr 2021 03:00:06 -0700
Tzadik Vanderhoof <tzadik.vanderhoof@xxxxxxxxx> wrote:
However, on Windows, UTF-8 strings passed to "p4 submit -d" are
somehow converted to the default Windows code page by the time they
are stored in the Perforce database, probably as part of the process
of passing the command line arguments to the Windows p4 executable.
However, the "code page" data is *not* converted to UTF-8 on the way
back from p4 to git-p4.py.  The only way to get it into UTF-8 is to
call string.decode().  As a result, this patch, which takes out the
call to string.decode() will not work on Windows.

Thanks for that explanation, the reencoding of the data on Windows is
not something I was expecting.  Given the behaviour you've described, I
suspect that there might be two different problems that we are trying
to solve.

The perforce depot I'm working with has a mixture of encodings, and
commits are created from a variety of different environments. The
majority of commits are ASCII or UTF-8, there are a small number that
are in some other encoding.  Any attempt to reencode the data is likely
to make the problem worse in at least some cases.

I suspect that other perforce depots are used primarily from Windows
machines, and have data that is encoded in a mostly consistent way but
the encoding is not UTF-8.  Re-encoding the data for git makes sense in
that case.  Is this the kind of repository you have?

If there are these two different cases then we probably need to come up
with a patch that solves both issues.

For my cases where we've got a repository containing all sorts of junk,
it sounds like it might be awkward to create a test case that works on
Windows.

https://www.perforce.com/perforce/doc.current/user/i18nnotes.txt

Tzadik - is your server unicode enabled or not? That would be 
interesting to know:

    p4 counters | grep -i unicode

I suspect it is not. It's only if unicode is enabled that the server 
will convert to/from utf8 (at least that's my understanding). Without 
this setting, p4d and p4 are (probably) not doing any conversions.

I think it might be useful to clarify exactly what conversions are 
actually happening.

I wonder what encoding Perforce thinks you've got in place.