> On 30 Jan 2018, at 22:56, Junio C Hamano <gitster@xxxxxxxxx> wrote: > > Lars Schneider <larsxschneider@xxxxxxxxx> writes: > >>> On 30 Jan 2018, at 21:05, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> >>> tboegi@xxxxxx writes: >>> >>>> + if ((conv_flags & CONV_WRITE_OBJECT) && !strcmp(enc->name, "SHIFT-JIS")) { >>>> + char *re_src; >>>> + int re_src_len; >>> >>> I think it is a bad idea to >>> >>> (1) not check without CONV_WRITE_OBJECT here. >> >> The idea is to perform the roundtrip check *only* if we >> actually write to Git. In all other cases we don't care >> if the encoding roundtrips. >> >> "git checkout" is such a case where we don't care as >> noted by Peff here: >> https://public-inbox.org/git/20171215095838.GA3567@xxxxxxxxxxxxxxxxxxxxx/ >> >> Do you agree? > > I am not sure why this is special cased and other codepaths have "if > WRITE_OBJECT then die, otherwise error" checks, so no, I do not > agree with your reasoning, at least not yet. The convert_to_git()/encode_to_git() machinery is used in two different kinds of code paths: Some code paths actually write to the Git database (indicated by the CONV_WRITE_OBJECT flag). I consider these the "critical/important" code paths and I don't want to tolerate any encoding errors in these cases as the errors would be "forever" in the Git database. That's why I call die() on errors for these cases to abort whatever we are doing. Other code paths do not write to the Git database (e.g. during "git checkout" we use the code to ensure that we are moving away from the exact state that we think we are moving away). In these code paths I am less concerned about encoding errors. I also don't want to abort the operation (e.g. "git checkout") in these cases. That's why I only inform the user about the problem with an error message. The encoding round-trip check can be expensive. That's why I decided initially to only execute the check in the "critical/important" write-to-Git-database situations (CONV_WRITE_OBJECT flag!). I also decided to run it only if the "SHIFT-JIS" encoding is used as this was the only encoding that I could find which reportedly does not round-trip with UTF-8 (although I was not able to replicate the round-trip problems). I want to change the current implementation as follows: I want to check the round-trip encoding only if the the environment variable "GIT_WORKING_TREE_ENCODING_ROUNDTRIP_CHECK" is set. This way a user can check the round-trip if necessary for *any* encoding. I don't want to make it a git config because that setting should only rarely be used for debugging purposes. Performing the round-trip check every time is not necessary from my point of view because it can be expensive and I was not able to generate a test case which *does not* round-trip without triggering any other iconv error. - Lars