Re: [PATCH] git-p4: fix crlf handling for utf16 files on Windows

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Moritz Baumann via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

> From: Moritz Baumann <moritz.baumann@xxxxxxx>

Can you describe briefly what problem is being solved and how the
change solves it in this place above your Sign-off?  The title says
"fix", without saying how the behaviour by the current code is
"broken", so that is one thing you can describe.  It talks about
"UTF-16 files on Windows", but does it mean git-p4 running on
Windows or git-p4 running anywhere that (over the wire) talks with
P4 running on Windows?  IOW, would the same problem trigger if you
are on macOS but the contents of the file you exchange with P4
happens to be in UTF-16?

These are the things you can describe to help those who are not you
(i.e. without access to an environment similar to what you saw the
problem on) understand the issue and help them convince themselves
that the patch they are seeing is a sensible solution.  Without any,
it is hard to evaluate.

> Signed-off-by: Moritz Baumann <moritz.baumann@xxxxxxx>
> ---

> diff --git a/git-p4.py b/git-p4.py
> index 8fbf6eb1fe3..0a9d7e2ed7c 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -3148,7 +3148,7 @@ class P4Sync(Command, P4UserMap):
>                      raise e
>              else:
>                  if p4_version_string().find('/NT') >= 0:
> -                    text = text.replace(b'\r\n', b'\n')
> +                    text = text.replace(b'\x0d\x00\x0a\x00', b'\x0a\x00')
>                  contents = [text]
>  
>          if type_base == "apple":

OK, the part being touched is inside this context:

        if type_base == "utf16":
            # ...
            # But ascii text saved as -t utf16 is completely mangled.
            # Invoke print -o to get the real contents.
            #
            # On windows, the newlines will always be mangled by print, so put
            # them back too.  This is not needed to the cygwin windows version,
            # just the native "NT" type.
            #

            try:
                text = ...
            except Exception as e:
                ...
            else:
                if p4_version_string().find('/NT') >= 0:
                    text = text.replace(b'\r\n', b'\n')
                contents = [text]

So the intent of the existing code is "we know we are dealing with
UTF-16 text, and after successfully reading 'text' without
exception, we need to convert CRLF back to LF if we are on 'the
native NT type'".  Presumably 'text' that came from
p4_read_pipe(... raw=True) is not unicode string but just a bunch of
bytes, so each "char" is represented as two-byte sequence in UTF-16?

With that (speculative) understanding, I can guess that the patch
makes sense, but the patch should not make readers guess.

Thanks.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux