Lars Schneider <larsxschneider@xxxxxxxxx> writes: >> I am not sure why this is special cased and other codepaths have "if >> WRITE_OBJECT then die, otherwise error" checks, so no, I do not >> agree with your reasoning, at least not yet. > > The convert_to_git()/encode_to_git() machinery is used in two different > kinds of code paths: > > Some code paths actually write to the Git database (indicated by the > CONV_WRITE_OBJECT flag). I consider these the "critical/important" code > paths and I don't want to tolerate any encoding errors in these cases as > the errors would be "forever" in the Git database. That's why I call > die() on errors for these cases to abort whatever we are doing. > > Other code paths do not write to the Git database (e.g. during "git > checkout" we use the code to ensure that we are moving away from the > exact state that we think we are moving away). In these code paths I am > less concerned about encoding errors. I also don't want to abort the > operation (e.g. "git checkout") in these cases. That's why I only inform > the user about the problem with an error message. Warning the users early while they are doing non-writing operation to give them chance to adjust the contents, before they actually need to register the contents as objects by writing, at which point we need to die. That's a reasonable distinction and all of that I already agree with. What was questionable and left unexplained was why this roundtrip thing needs to be different. > The encoding round-trip check can be expensive. That's why I decided > initially to only execute the check in the "critical/important" > write-to-Git-database situations (CONV_WRITE_OBJECT flag!). I also > decided to run it only if the "SHIFT-JIS" encoding is used as this was > the only encoding that I could find which reportedly does not round-trip > with UTF-8 (although I was not able to replicate the round-trip > problems). I still do not see why you have problems with the approach of maintaining a configurable set of "iffy" encodings (and throw SJIS into the default list) to achieve all of the above and more. For SJIS users, instead of having to set environment variables to obtain safe behaviour, they automatically get safe behaviour. When using encodings that are not problematic, they do not need to spend cycles checking round-trip. And when SJIS users know they do not care about roundtrip checks, they can just configure SJIS away from the list.