Re: [PATCH 2/2] git-p4: do not decode data from perforce by default

Tzadik Vanderhoof <tzadik.vanderhoof@xxxxxxxxx> · Fri, 30 Apr 2021 11:08:57 -0700

On Fri, Apr 30, 2021 at 8:33 AM Luke Diamand <luke@xxxxxxxxxxx> wrote:
>
> Tzadik - is your server unicode enabled or not? That would be
> interesting to know:
>
>      p4 counters | grep -i unicode
>
> I suspect it is not. It's only if unicode is enabled that the server
> will convert to/from utf8 (at least that's my understanding). Without
> this setting, p4d and p4 are (probably) not doing any conversions.

My server is not unicode.

These conversions are happening even with a non-Unicode perforce db.
I don't think it's the p4d code per se that is doing the conversion, but
rather an interaction between the OS and the code, which is different
under Linux vs Windows.  If you create a trivial C program that dumps
the hex values of the bytes it receives in argv, you can see this
different behavior:

#include <stdio.h>

void main(int argc, char *argv[]) {
    int i, j;
    char *s;
    for (i = 1; i < argc; ++i) {
        s = argv[i];
        for (j = 0; s[j] != '\0'; ++j)
            printf(" %X", (unsigned char)s[j]);
        printf("\n");
        printf("[%s]\n\n", s);
    }
}

When built with Visual Studio and called from Cygwin, if you pass in
args with UTF-8 encoded characters, the program will spit them out in
cp1252. If you compile it on a Linux system using gcc, it will spit them out
in UTF-8 (unchanged).  I suspect that's what's happening with p4d on
Windows vs Linux.

In any event, if you look at my patch (v6 is the latest...
https://lore.kernel.org/git/20210429073905.837-1-tzadik.vanderhoof@xxxxxxxxx/ ),
you will see I have written tests that pass under both Linux and Windows.
(If you want to run them yourself, you need to base my patch off of "master",
not "seen").  The tests make clear what the different behavior is and
also show that p4d is not set to Unicode (since the tests do not change the
default setting).