On Fri, Nov 19, 2021 at 8:08 AM René Scharfe <l.s.r@xxxxxx> wrote: > > Am 19.11.21 um 08:00 schrieb Ævar Arnfjörð Bjarmason: > > > > On Thu, Nov 18 2021, René Scharfe wrote: > > > >> Am 18.11.21 um 19:17 schrieb Junio C Hamano: > >>> Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >> [...] > >>> I guess this is a lot of change in the amount of text involved but > >>> the least amount of actual change in the behaviour. For those with > >>> newer PCREv2, the behaviour would be the same as v2.34.0, and for > >>> others, the behaviour would be the same as v2.33.0. > >>> > >>> Having said all that, because the consensus seems to be that the > >>> whole "when we should match in UTF mode" may need to be rethought, I > >>> think reverting Hamza's [v13 3/3] would be the simplest way to clean > >>> up the mess for v2.34.1 that will give us a cleaner slate to later > >>> build on, than applying this patch. > >> > >> Makes sense to me. It gives a better starting point to solve the issue > >> afresh without getting entangled in mind-melting boolean expressions. > > > > Yes, agreed. As noted I haven't had time to dig deeply into this, but > > from what I've seen so far there doesn't seem to be any obvious way > > forward in terms of a quick fix. > > > > I thought perhaps your patch would be that (but I haven't looked into it > > carefully enough), but since you're on-board with reverting & retrying. > > That patch should fix the edge case without any side-effects -- at least > I haven't seen any reports of ill effects that would apply to it. Since it isn't restricted to log, it will still cause a regression to the `git grep` case with binary data for versions of PCRE2 older than 10.34 and unlike the previous one it might not trigger an error in the testsuite just because we are missing a test for it. > It's > easier to understand and reason about when applied after reverting, I > think. But it's only for grep.c and I don't know the situation in t/. We had been focusing in PCRE in this discussion, but I see the strict behaviour of older PCRE2 as just a "coal mine canary" to point to the bigger problem that we are expecting git's regex to handle safely and correctly from the point of UTF, what is technically a binary match with --color making the mismatch obvious. The issue is not unique to PCRE, and seems Ævar also acknowledges[1] that by seeing the same bug this was attempting to fix with probably some version of glibc's ERE. I suspect FreeBSD's (and derivatives) is also broken and might be throwing REGILLSEQ errors as well, so I think that it is better to revert the whole thing. Carlo [1] https://lore.kernel.org/git/211119.86r1bc4om5.gmgdl@xxxxxxxxxxxxxxxxxxx/