Re: *Really* noisy encoding warnings post-v2.33.0

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Oct 10 2021, Johannes Sixt wrote:

> Am 09.10.21 um 04:36 schrieb Jeff King:
>> On Sat, Oct 09, 2021 at 02:58:10AM +0200, Ævar Arnfjörð Bjarmason wrote:
>> 
>>> I ran into this while testing the grep coloring patch[1] (but it's
>>> unrelated). Before this commit e.g.:
>>>
>>>     LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100|wc -l
>>>     28333
>>>
>>> So ~3k lines for my last 100 commits, but then:
>>>
>>>     $ LC_ALL=C ~/g/git/git -P -c i18n.commitEncoding=ascii log --author=Ævar -100 2>&1|grep -c ^warning
>>>     299
>>>
>>> At first I thought it was spewing warnings for every failed re-encoded
>>> line in some cases, because I get hundreds at a time sometimes, but it's
>>> because stderr and stdout I/O buffering is different (a common
>>> case). Adding a "fflush(stderr)" "fixes" that.
>> 
>> I don't think the buffering is the issue. By default stderr flushes on
>> lines, and we flush commits after showing them. If you take away "-P"
>> (or look at the combined 2>&1 output in order), you'll see that they are
>> grouped.
>> 
>> Now one thing you might notice is that there may be multiple warnings
>> between output commits. But that's because we really are re-encoding
>> each of those intermediate commits to do your --author grep. And if that
>> re-encoding fails, we may well be producing the wrong output, because
>> the matching won't be correct (in your case, presumably the correct
>> output should be _nothing_, because Æ is not an ascii character).
>
> I don't understand why i18n.commitEncoding plays a role here. Isn't it
> an instruction "when you make a commit, mark the commit message having
> this encoding". But grep does not make a commit.
>
> If this were i18n.logOuputEncoding it would make much more sense.
>
> Have I misunderstood the meaning of the two options?

It doesn't, see my later <871r4umfnm.fsf@xxxxxxxxxxxxxxxxxxx> for when I
got it right.

For the amount of warnings etc. it's the same, whether we call iconv
because it's e.g. ascii->utf-8 and that triggers iconv() issues, or
(with i18n.logOuputEncoding) utf-8->ascii.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux