On Mon, Nov 08 2021, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >>> So you are comparing >>> >>> * requiring bash and C.UTF-8 locale to be available >>> >>> vs >>> >>> * requiring git built with PCRE >>> >>> assuming that "Dscho says doesn't work with PCRE and you say it >>> works with PCRE" is resolved? They seem roughly the same >>> difficulty to me. >> >> We can hard depend on a git built with PCRE, since the point of this >> thing is to run in GitHub CI, Ubuntu builds git with PCRE, and that's >> unlikely to ever change. > > Yes, so is the availability of bash and C.UTF-8 for the same reason: > we are talking about controlled environment. That is what I meant > by "roughly the same difficulty to me". > > FWIW, I am OK with either approach, as I find the patch in question > is just as readable as any rewrite that would use "grep -P", so... To each his own I guess :) I do find the simple regex of: '[\N{U+202a}-\N{U+202e}\N{U+2066}-\N{U+2069}]' Much easier to understand than something using printf, shell interpolation, and needing to switch around LC_CTYPE to two different values twice on one line. But anyway, that's a matter of taste. What isn't is the issue I noted at the bottom of [1], i.e. if we're relying on '(attr:binary)' we should probably start with an assertion that our idea of "binary" matches reality, or perhaps go back to the -I heuristic. Because now we've got at least one binary non-attribute-marked file, and if files like that ever get updated they might start matching this pattern. Maybe not a big deal, but someone updating those might confusingly trip over this otherwise... 1. https://lore.kernel.org/git/211103.86zgqlhzvz.gmgdl@xxxxxxxxxxxxxxxxxxx/