Re: [PATCH 2/2] git-p4: do not decode data from perforce by default

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 05 May 2021 13:34:32 +0900

Tzadik Vanderhoof <tzadik.vanderhoof@xxxxxxxxx> writes:

> On Tue, May 4, 2021 at 6:11 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>
>> Tzadik Vanderhoof <tzadik.vanderhoof@xxxxxxxxx> writes:
>>
>> > On Tue, May 4, 2021 at 2:01 PM Andrew Oakley <andrew@xxxxxxxxxxxxx> wrote:
>> >> The key thing that I'm trying to point out here is that the encoding is
>> >> not necessarily consistent between different commits.  The changes that
>> >> you have proposed force you to pick one encoding that will be used for
>> >> every commit.  If it's wrong then data will be corrupted, and there is
>> >> no option provided to avoid that.  The only way I can see to avoid this
>> >> issue is to not attempt to re-encode the data - just pass it directly
>> >> to git.
>> > ...
> Are you talking about a scenario where most of the commits are UTF-8,
> one is "cp1252" and another one is "cp1251", so a total of 3 encodings
> are used in the Perforce depot?  I don't think that is a common scenario.

Yes.  I think that is where "not necessarily consistent between
different commits" leads us to---not limited only to two encodings.

> I agree with the idea that if you know what the encoding is, then
> why not just use that knowledge to convert that to UTF-8, rather
> than use the encoding header.