Re: [PATCH v5 5/7] convert: add 'working-tree-encoding' attribute

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 30 Jan 2018, at 22:56, Junio C Hamano <gitster@xxxxxxxxx> wrote:
> 
> Lars Schneider <larsxschneider@xxxxxxxxx> writes:
> 
>>> On 30 Jan 2018, at 21:05, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>> 
>>> tboegi@xxxxxx writes:
>>> 
>>>> +	if ((conv_flags & CONV_WRITE_OBJECT) && !strcmp(enc->name, "SHIFT-JIS")) {
>>>> +		char *re_src;
>>>> +		int re_src_len;
>>> 
>>> I think it is a bad idea to 
>>> 
>>> (1) not check without CONV_WRITE_OBJECT here.
>> 
>> The idea is to perform the roundtrip check *only* if we 
>> actually write to Git. In all other cases we don't care
>> if the encoding roundtrips.
>> 
>> "git checkout" is such a case where we don't care as 
>> noted by Peff here:
>> https://public-inbox.org/git/20171215095838.GA3567@xxxxxxxxxxxxxxxxxxxxx/
>> 
>> Do you agree?
> 
> I am not sure why this is special cased and other codepaths have "if
> WRITE_OBJECT then die, otherwise error" checks, so no, I do not
> agree with your reasoning, at least not yet.

The convert_to_git()/encode_to_git() machinery is used in two different
kinds of code paths:

Some code paths actually write to the Git database (indicated by the 
CONV_WRITE_OBJECT flag). I consider these the "critical/important" code 
paths and I don't want to tolerate any encoding errors in these cases as 
the errors would be "forever" in the Git database. That's why I call 
die() on errors for these cases to abort whatever we are doing. 

Other code paths do not write to the Git database (e.g. during "git 
checkout" we use the code to ensure that we are moving away from the 
exact state that we think we are moving away). In these code paths I am 
less concerned about encoding errors. I also don't want to abort the 
operation (e.g. "git checkout") in these cases. That's why I only inform
the user about the problem with an error message.

The encoding round-trip check can be expensive. That's why I decided 
initially to only execute the check in the "critical/important" 
write-to-Git-database situations (CONV_WRITE_OBJECT flag!). I also 
decided to run it only if the "SHIFT-JIS" encoding is used as this was
the only encoding that I could find which reportedly does not round-trip
with UTF-8 (although I was not able to replicate the round-trip 
problems). 

I want to change the current implementation as follows:

I want to check the round-trip encoding only if the the environment 
variable "GIT_WORKING_TREE_ENCODING_ROUNDTRIP_CHECK" is set. This way
a user can check the round-trip if necessary for *any* encoding. I
don't want to make it a git config because that setting should only 
rarely be used for debugging purposes.

Performing the round-trip check every time is not necessary from my 
point of view because it can be expensive and I was not able to generate
a test case which *does not* round-trip without triggering any other
iconv error.

- Lars



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux