On Fri, Jan 10, 2025 at 01:59:18PM +0100, Andreas Schwab wrote: > On Jan 10 2025, Jeff King wrote: > > > but it is weird to me that patmatch() will match "^$" to the end of the > > buffer at all. It is just calling regexec_buf() behind the scenes, so I > > guess this is just a weird special case there, and may even depend on > > the regex implementation. > > Shouldn't the matcher be called with REG_NOTEOL in that case? Perhaps. If regexec_buf() is assuming we are feeding lines, then without REG_NOTEOL it thinks the end of the buffer is the end of a line. Which makes sense, but trips up this case because we are not feeding lines, but rather a whole buffer. So the final newline is not the start of an empty line, but the true end of the buffer. But what if the buffer doesn't end in a newline? In the example, the file is something like "content\n". But what if it was just "content"? Then the end of the buffer really is the end of a line, isn't it? And REG_NOTEOL would not be appropriate. So without REG_NOTEOL: [this is wrong, per the report] $ echo content >file.txt $ git grep --no-index -n '^$' file.txt file.txt:2: [this is right] $ printf content >file.txt $ git grep --no-index -n '^$' file.txt $ echo $? 1 and with it, like this patch: diff --git a/grep.c b/grep.c index 4e155ee9e6..7e3b6d9474 100644 --- a/grep.c +++ b/grep.c @@ -1467,7 +1467,7 @@ static int look_ahead(struct grep_opt *opt, int hit; regmatch_t m; - hit = patmatch(p, bol, bol + *left_p, &m, 0); + hit = patmatch(p, bol, bol + *left_p, &m, REG_NOTEOL); if (hit < 0) return -1; if (!hit || m.rm_so < 0 || m.rm_eo < 0) we get: [this is now right] $ git grep --no-index -n '^$' file.txt $ echo $? 1 [and this stays right] $ printf content >file.txt $ git grep --no-index -n '^$' file.txt $ echo $? 1 but: [without REG_NOTEOL, this matches] $ printf content >file.txt $ git grep --no-index -n 't$' file.txt file.txt:1:content [but with that flag, it no longer does] $ printf content >file.txt $ git grep --no-index -n 't$' file.txt $ echo $? 1 So I do think "\n" at the end of the buffer is a special case. Perhaps we should always omit it, and then leave REG_NOTEOL unset, making the end of the buffer consistently the end of the final line. Like this, which no longer matches "^$" but does match "t$": diff --git a/grep.c b/grep.c index 4e155ee9e6..c4bb9f1081 100644 --- a/grep.c +++ b/grep.c @@ -1646,6 +1646,8 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle bol = gs->buf; left = gs->size; + if (left && gs->buf[left-1] == '\n') + left--; while (left) { const char *eol; int hit; -Peff