RE: [PATCH 0/6] Transition git-p4.py to support Python 3 only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> The commit messages could just really use some extra hand-holding and
> explanation, and a clear split-out of things related to the version bump v.s.
> things not needed for that, or unrelated refactorings.

Yes, I am getting this message loud and clear. I will resubmit with more detailed commit messages today.

To explain the story here: I started using git-p4 as part of my work-flow, and I expect to need it for several years to come. As I began to use it, I found that a series of bugs - mostly related to character encoding. In fixing these, I found that some of the troubles were specific to Python 3 - or rather Python 2's less strict approach to distinguishing between byte sequences and textual strings allowed the script to proceed Python 2 even though what it was doing was in fact invalid, and was potentially corrupting data.

A common problem that users are encountering is that the script attempts to decode incoming textual byte-streams into UTF-8 strings. On Python 3 this fails with an exception if the data contains invalid UTF-8 codes. For text files created in Windows, CP1252 Smart Quote characters: 0x93 and 0x94 are seen fairly frequently. These codes are invalid in UTF-8, so if the script encounters any file or file name containing them, it will fail with an exception.

Tzadik Vanderhoof submitted a patch attempting to fix some of these issues in April 2021:
https://lore.kernel.org/git/20210429073905.837-1-tzadik.vanderhoof@xxxxxxxxx/

My two comments about this patch are that 1. It doesn't fix my issue, and 2. Even with the proposed fallbackEncoding option it still leaves git-p4 broken by default.

A fallbackEncoding option may still be necessary, but I found that most of the issues I encountered could be side-stepped by simply avoiding decoding incoming data into UTF-8 in the first place.

Keeping a clean separation between encoded and decoded text is much easier to do in Python 3. If Python 2 support must be maintained, this will require careful testing of separate code-paths both platforms which I don't regard as reasonable given that Python 2 is thoroughly deprecated. Therefore, this first patch-set focusses primarily on removing Python 2 support.

Furthermore, because I expect to be using git-p4 in my daily work-flow for some time to come, I am interested in investing some effort into improving it. There are bugs, unreliable behaviour, user-hostile behaviour, as well as code that would benefit from clean-up and modernisation. In submitting these patches, I am trying to get a read on to what extent such efforts would be accepted by the Git maintainers. 

Is it preferable that patch-sets have a tight focus on a single topic? I am already dividing up my full patch collection. I can divide it further if requested. I am happy to do this, I was just worried that it just might make longer to get all my patches through review.


> Some of these changes also just seem to be entirely unrelated refactorings,
> e.g. 6/6 where you're changing a multi-line commented regexp into
> something that's a dense one-liner. Does Python 3 not support the
> equivalent of Perl's /x, or is something else going on here?

I will improve the commit message to explain the changes being made here.

The regexp is matching RCS keywords: https://www.perforce.com/manuals/p4guide/Content/P4Guide/filetypes.rcs.html - $File$, $Author$, $Author$ etc., a very simple match. We could keep it multi-line, though this seems overkill to me.

The main significance of this change that previously git-p4 would compile one of these two regexes for every single file processed. This patch just pre-compiles the two regexes (now binary regexes, not utf-8 regexes) at the start of the script.
 




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux