Re: git grep: ^$ false match at end of file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 09, 2025 at 11:52:55PM +0000, Olly Betts wrote:

> git grep '^$' seems to match at the end of the file, reporting a line
> number one greater than the number of lines in that file.  This does
> not match the behaviour of grep.
> 
> To reproduce:
> 
> $ git init -q git-grep-bug
> $ cd git-grep-bug
> $ echo test > test.txt
> $ git add test.txt
> $ git commit -m test
> [master (root-commit) 55b48b26] test
>  1 file changed, 1 insertion(+)
>  create mode 100644 test.txt
> $ git grep -n '^$'
> test.txt:2:
> $ grep -n '^$' test.txt

Interesting case. Bisection shows that it started doing that in
34349bea60 (Merge branch 'jc/grep-lookahead', 2010-01-20). So it has
been that way for quite a long time. But it is doubly curious, since
neither of the parent trees exhibit the behavior. It is the merge itself
which causes the problem.

In the first-parent tree 34349bea60^1, we are still calling external
"grep", which could explain why we don't see any problem. But building
with NO_EXTERNAL_GREP (and confirming that it uses the internal code),
it doesn't show the problem either!

So where did the bug come from? Puzzled.

That branch itself contains a merge, e2d2e383d8 (Merge branch
'jc/maint-1.6.4-grep-lookahead' into jc/maint-grep-lookahead,
2010-01-12). If we merge that into 34349bea60^1, the innocent
first-parent, then the bug appears. And that brings in a bunch of
lookahead code that could plausibly be the problem.

I'm still confused why 34349bea60^2 (which does have the lookahead code)
doesn't show the bug. I guess there's some bad interaction with what had
happened in the meantime along the first-parent branch.

Looking at:

  git diff 34349bea60^2 34349bea60 -- grep.c builtin-grep.c

turns up:

diff --git a/builtin-grep.c b/builtin-grep.c
index 12833733db..da854fa94f 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -182,8 +182,6 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 			error("'%s': %s", filename, strerror(errno));
 		return 0;
 	}
-	if (!st.st_size)
-		return 0; /* empty file -- no grep hit */
 	if (!S_ISREG(st.st_mode))
 		return 0;
 	sz = xsize_t(st.st_size);
@@ -198,6 +196,7 @@ static int grep_file(struct grep_opt *opt, const char *filename)
 		return 0;
 	}
 	close(i);
+	data[sz] = 0;
 	if (opt->relative && opt->prefix_length)
 		filename = quote_path_relative(filename, -1, &buf, opt->prefix);
 	i = grep_buffer(opt, filename, data, sz);
@@ -223,7 +222,7 @@ static int grep_cache(struct grep_opt *opt, const char **paths, int cached)
 		 * are identical, even if worktree file has been modified, so use
 		 * cache version instead
 		 */
-		if (cached || (ce->ce_flags & CE_VALID)) {
+		if (cached || (ce->ce_flags & CE_VALID) || ce_skip_worktree(ce)) {
 			if (ce_stage(ce))
 				continue;
 			hit |= grep_sha1(opt, ce->sha1, ce->name, 0);

Ah. That middle hunk seems to be the culprit. But that probably means we
were looking at uninitialized memory before, and the lookahead code was
always wrong (but got lucky when there was a non-zero byte in that final
slot). :-/

So probably the issue is the changes from a26345b608 (grep: optimize
built-in grep by skipping lines that do not hit, 2010-01-10).

I'll stop digging on it for now (but adding Junio to the cc as the
author there). Probably it would have been faster just to start with a
debugger than to look through the history. ;)

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux