RE: [PATCH] git-p4: fix crlf handling for utf16 files on Windows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio,

Thank you for your notes. I assumed the intent of the original code would be clear, in which case the fix should also be clear, but I am happy to elaborate.

> Can you describe briefly what problem is being solved and how the change
> solves it in this place above your Sign-off?  […] It talks about
> "UTF-16 files on Windows", but does it mean git-p4 running on Windows or
> git-p4 running anywhere that (over the wire) talks with
> P4 running on Windows?  IOW, would the same problem trigger if you are on
> macOS but the contents of the file you exchange with P4 happens to be in
> UTF-16?

The potential problem that the original code was trying to solve is the following: If a file is marked as utf16 in Perforce, and if the Perforce client is on Windows, then Perforce will replace all LF line endings with CRLF when the file is synced. This is different from git's autocrlf behavior, which ignores UTF-16 encoded files and always treats them as binary files. Without special handling, this can lead to git-p4 creating files with different hashes when run on Windows. (Which is how I stumbled upon this issue.)

Therefore, git-p4 checks the Perforce "file type" and tries to undo the line endings changes.

> So the intent of the existing code is "we know we are dealing with
> UTF-16 text, and after successfully reading 'text' without exception, we need
> to convert CRLF back to LF if we are on 'the native NT type'".  Presumably
> 'text' that came from p4_read_pipe(... raw=True) is not unicode string but just
> a bunch of bytes, so each "char" is represented as two-byte sequence in UTF-
> 16?

Exactly. The original code tried to do the right thing to ensure stable hashes that are independent of the operating system git-p4 is run on, but failed to do so successfully. With my fix, I finally got deterministic hashes on my test repository.

> With that (speculative) understanding, I can guess that the patch makes sense,
> but the patch should not make readers guess.

Do you need me to resubmit the patch with an explanatory description? If so, I can try to summarize the above.

Best regards,
Moritz




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux