Re: git grep: ^$ false match at end of file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 10, 2025 at 06:43:08AM -0500, Jeff King wrote:

> I'll stop digging on it for now (but adding Junio to the cc as the
> author there). Probably it would have been faster just to start with a
> debugger than to look through the history. ;)

OK, my curiosity got the better of me. This fixes it:

diff --git a/grep.c b/grep.c
index 4e155ee9e6..9eac3dd95d 100644
--- a/grep.c
+++ b/grep.c
@@ -1470,10 +1470,12 @@ static int look_ahead(struct grep_opt *opt,
 		hit = patmatch(p, bol, bol + *left_p, &m, 0);
 		if (hit < 0)
 			return -1;
 		if (!hit || m.rm_so < 0 || m.rm_eo < 0)
 			continue;
+		if (m.rm_so == *left_p)
+			continue; /* don't match nothing */
 		if (earliest < 0 || m.rm_so < earliest)
 			earliest = m.rm_so;
 	}
 
 	if (earliest < 0) {

but it is weird to me that patmatch() will match "^$" to the end of the
buffer at all. It is just calling regexec_buf() behind the scenes, so I
guess this is just a weird special case there, and may even depend on
the regex implementation. If I pass "-P" to use pcre instead, the
problem goes away even without my patch.

If we skip look-ahead the problem also goes away. I'd have thought
match_line() would have the same problem, but there we process line by
line, and regexec_buf() never even sees the newline.

So I guess the rationale is: some regexec implementations are weird
about this special regex, and we should not trust their result with it
on a whole buffer with newlines.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux