Re: [PATCH] git-p4: preserve utf8 BOM when importing from p4 to git

Tao Klerks <tao@xxxxxxxxxx> · Mon, 19 Dec 2022 10:09:44 +0100

On Thu, Dec 15, 2022 at 12:11 AM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> Tao Klerks <tao@xxxxxxxxxx> writes:
>
> > Again, I'm not attempting to defend the breakage - just outlining why
> > I don't see how "using the Perforce variable P4CHARSET" would solve
> > anything.
> >
> >> This new behavior has made it impossible for
> >> me to submit changes to files of type "utf8"!  Any attempt fails with
> >> "patch does not apply" and the erroneously added BOM is the cause.
> >
> > I will try to understand the "unicode enabled server" behavior today
> > or tomorrow and see what options might make sense.
> >
> >> I propose rolling back the patch that introduced this behavior,
> >
> > Junio is the expert here and has noted it's a little late for that. I
> > obviously defer to his expertise as to git's release and backout
> > strategy.
> >
>
> It sounds like, if your conjecture turns out to be correct in that
> those P4 users who interact unicode enabled servers would have
> P4CHARSET and others don't, we may not need an extra configuration
> but pay attention to the P4CHARSET variable (or lack of it) and
> switch the behaviour.

Yes, I suspect some sort of detection will be required. There appears
to be no way to query the server for this "unicode mode" directly, but
you can force the client to try connecting in the "wrong" mode for the
server, and catch the corresponding error. Ugly, but effective.

(the reason it's hard to just test for "P4CHARSET" is that there are
several ways to set it, not just the environment, and there are
multiple versions of the setting, per-connection or global; setting
the global override and testing for failure is likely to be safer than
attempting to understand/evaluate the hierarchy of settings)

> > I would like to have a go at understanding what the options are (how
> > we can get correct and functional behavior for all users), before
> > proposing a specific course of action.

I have finally managed to start testing with the "unicode enabled
server" behavior.

So far I've learned that:
 * Some of our tests around file content encoding handling do fail
with the server in this mode (not necessarily because we're doing the
wrong thing, but because the server's behavior doesn't match our
expectations) these failures may correspond to bugs to be fixed, or
tests to be adjusted to match appropriate expectations in this
"unicode enabled mode"
 * Our tests around "git p4 submit" *don't* seem to fail, even on
utf-8-bom files - so I have not yet reproduced Tzadik's issue

(I keep placing "unicode enabled server" in quotes because I don't
want to give the impression that perforce in "normal" mode doesn't
handle unicode content - it absolutely does, but... differently.)

I definitely need to keep testing around this to understand what the
right thing to do for Tzadik (and others like him of course) might be.

Tzadik, could you provide any more detail about the failing situation?
One piece of info that might be particularly helpful is *what is the
exact/full p4 FileType of the problem file?*

Thanks,
Tao