On Thu, Jan 09, 2025 at 11:52:55PM +0000, Olly Betts wrote: > git grep '^$' seems to match at the end of the file, reporting a line > number one greater than the number of lines in that file. This does > not match the behaviour of grep. > > To reproduce: > > $ git init -q git-grep-bug > $ cd git-grep-bug > $ echo test > test.txt > $ git add test.txt > $ git commit -m test > [master (root-commit) 55b48b26] test > 1 file changed, 1 insertion(+) > create mode 100644 test.txt > $ git grep -n '^$' > test.txt:2: > $ grep -n '^$' test.txt Interesting case. Bisection shows that it started doing that in 34349bea60 (Merge branch 'jc/grep-lookahead', 2010-01-20). So it has been that way for quite a long time. But it is doubly curious, since neither of the parent trees exhibit the behavior. It is the merge itself which causes the problem. In the first-parent tree 34349bea60^1, we are still calling external "grep", which could explain why we don't see any problem. But building with NO_EXTERNAL_GREP (and confirming that it uses the internal code), it doesn't show the problem either! So where did the bug come from? Puzzled. That branch itself contains a merge, e2d2e383d8 (Merge branch 'jc/maint-1.6.4-grep-lookahead' into jc/maint-grep-lookahead, 2010-01-12). If we merge that into 34349bea60^1, the innocent first-parent, then the bug appears. And that brings in a bunch of lookahead code that could plausibly be the problem. I'm still confused why 34349bea60^2 (which does have the lookahead code) doesn't show the bug. I guess there's some bad interaction with what had happened in the meantime along the first-parent branch. Looking at: git diff 34349bea60^2 34349bea60 -- grep.c builtin-grep.c turns up: diff --git a/builtin-grep.c b/builtin-grep.c index 12833733db..da854fa94f 100644 --- a/builtin-grep.c +++ b/builtin-grep.c @@ -182,8 +182,6 @@ static int grep_file(struct grep_opt *opt, const char *filename) error("'%s': %s", filename, strerror(errno)); return 0; } - if (!st.st_size) - return 0; /* empty file -- no grep hit */ if (!S_ISREG(st.st_mode)) return 0; sz = xsize_t(st.st_size); @@ -198,6 +196,7 @@ static int grep_file(struct grep_opt *opt, const char *filename) return 0; } close(i); + data[sz] = 0; if (opt->relative && opt->prefix_length) filename = quote_path_relative(filename, -1, &buf, opt->prefix); i = grep_buffer(opt, filename, data, sz); @@ -223,7 +222,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached) * are identical, even if worktree file has been modified, so use * cache version instead */ - if (cached || (ce->ce_flags & CE_VALID)) { + if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) { if (ce_stage(ce)) continue; hit |= grep_sha1(opt, ce->sha1, ce->name, 0); Ah. That middle hunk seems to be the culprit. But that probably means we were looking at uninitialized memory before, and the lookahead code was always wrong (but got lucky when there was a non-zero byte in that final slot). :-/ So probably the issue is the changes from a26345b608 (grep: optimize built-in grep by skipping lines that do not hit, 2010-01-10). I'll stop digging on it for now (but adding Junio to the cc as the author there). Probably it would have been faster just to start with a debugger than to look through the history. ;) -Peff