On Mon, May 15, 2017 at 8:24 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: > Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> writes: > >> Factor the test for \0 in grep patterns into a function. Since commit >> 9eceddeec6 ("Use kwset in grep", 2011-08-21) any pattern containing a >> \0 is considered fixed as regcomp() can't handle it. >> >> This limitation was never documented, and other some regular >> expression engines are capable of compiling a pattern containing a >> \0. Factoring this out makes a subsequent change which does that >> smaller. > > The latter half of the first sentence of this paragraph does not > parse well, around "and other some". "never documented" makes > readers expect "... so let's document it", but I think you are > driving in a different direction, something like... I'll try to reword it. > Since some other regular expression engines are capable of a > pattern to match a string with NUL in it, it would be nice > if we can use such an engine and match against NUL; as this > "patterns with NUL always use --fixed match" is not documented, > let's hope that nobody depend on that current behaviour. > This step does not yet change the behaviour yet, but it > makes it easier to do so in later steps. > > perhaps? I tend to agree that it is OK to break users who depended > on that "patterns with NULL match with --fixed" (partly because it > is so unintuitive and not useful behaviour, and partly because it is > easy for them to explicitly say -F), but let's make sure we clearly > say that we will be knowingly breaking them. I think there's a few different things going on here, which I'll address accordingly. * Yes, if someone is relying on the undocumented behavior of a \0 pattern implying -F if and when I change that their stuff will break. I'm going to actually just drop this whole discussion from this commit, we can have an explanation for that when and if I actually change it in a later series. * I'm making things more confusing by saying "engines" here, but it's important to note that git has never in its docs promised to use the C library implementation of POSIX regexes, it's just advertised that it uses POSIX regular *expressions*, which are a different thing. Implementation != syntax. E.g. GNU grep uses POSIX regex syntax, but its engine does not have the same limitation as ours: $ (command cd /tmp && printf "æ\0ð\n" >a && printf "æ\n" >> a; printf "æ\0[öð]" >f; grep -a -f f a; echo $?) æð 0 I went down this particular rabbit hole because I was genuinely surprised that this didn't work with git-grep, until I realized that of course the implementation detail of using C strings for this under the hood was bleeding through.