On Sun, Oct 10 2021, Johannes Sixt wrote: > Am 09.10.21 um 04:36 schrieb Jeff King: >> On Sat, Oct 09, 2021 at 02:58:10AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> I ran into this while testing the grep coloring patch[1] (but it's >>> unrelated). Before this commit e.g.: >>> >>> LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100|wc -l >>> 28333 >>> >>> So ~3k lines for my last 100 commits, but then: >>> >>> $ LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100 2>&1|grep -c ^warning >>> 299 >>> >>> At first I thought it was spewing warnings for every failed re-encoded >>> line in some cases, because I get hundreds at a time sometimes, but it's >>> because stderr and stdout I/O buffering is different (a common >>> case). Adding a "fflush(stderr)" "fixes" that. >> >> I don't think the buffering is the issue. By default stderr flushes on >> lines, and we flush commits after showing them. If you take away "-P" >> (or look at the combined 2>&1 output in order), you'll see that they are >> grouped. >> >> Now one thing you might notice is that there may be multiple warnings >> between output commits. But that's because we really are re-encoding >> each of those intermediate commits to do your --author grep. And if that >> re-encoding fails, we may well be producing the wrong output, because >> the matching won't be correct (in your case, presumably the correct >> output should be _nothing_, because Æ is not an ascii character). > > I don't understand why i18n.commitEncoding plays a role here. Isn't it > an instruction "when you make a commit, mark the commit message having > this encoding". But grep does not make a commit. > > If this were i18n.logOuputEncoding it would make much more sense. > > Have I misunderstood the meaning of the two options? It doesn't, see my later <871r4umfnm.fsf@xxxxxxxxxxxxxxxxxxx> for when I got it right. For the amount of warnings etc. it's the same, whether we call iconv because it's e.g. ascii->utf-8 and that triggers iconv() issues, or (with i18n.logOuputEncoding) utf-8->ascii.