Hi all, On Wed, 9 Aug 2017, David Coppa wrote: > On Wed, Aug 9, 2017 at 4:15 PM, René Scharfe <l.s.r@xxxxxx> wrote: > > Am 09.08.2017 um 08:15 schrieb René Scharfe: > >> Am 09.08.2017 um 07:29 schrieb Junio C Hamano: > >>> René Scharfe <l.s.r@xxxxxx> writes: > >>> > >>>> Am 09.08.2017 um 00:26 schrieb Junio C Hamano: > >>>>> ... but in the meantime, I think replacing the test with "0$" to > >>>>> force the scanner to find either the end of line or the end of the > >>>>> buffer may be a good workaround. We do not have to care how many of > >>>>> random bytes are in front of the last "0" in order to ensure that > >>>>> the regexec_buf() does not overstep to 4097th byte, while seeing > >>>>> that regexec() that does not know how long the haystack is has to do > >>>>> so, no? > >>>> > >>>> Our regexec() calls strlen() (see my other reply). > >>>> > >>>> Using "0$" looks like the best option to me. > >>> > >>> Yeah, it seems that way. If we want to be close/faithful to the > >>> original, we could do "^0*$", but the part that is essential to > >>> trigger the old bug is not the "we have many zeroes" (or "we have > >>> 4096 zeroes") part, but "zero is at the end of the string" part, so > >>> "0$" would be the minimal pattern that also would work for OBSD. > >> > >> Thought about it a bit more. > >> > >> "^0{4096}$" checks if the byte after the buffer is \n or \0 in the > >> hope of triggering a segfault. On Linux I can access that byte just > >> fine; perhaps there is no guard page. Also there is a 2 in 256 > >> chance of the byte being \n or \0 (provided its value is random), > >> which would cause the test to falsely report success. > >> > >> "0$" effectively looks for "0\n" or "0\0", which can only occur > >> after the buffer. If that string is found close enough then we > >> may not trigger a segfault and report a false positive. > >> > >> In the face of unreliable segfaults we need to reverse our strategy, > >> I think. Searching for something not in the buffer (e.g. "1") and > >> considering matches and segfaults as confirmation that the bug is > >> still present should avoid any false positives. Right? > > > > And that's not it either. *sigh* > > > > If the 4097th byte is NUL or LF then we can only hope its access > > triggers a segfault -- there is no other way to distinguish the > > result from a legitimate match when limiting with REG_STARTEND. So > > we have to accept this flakiness. > > > > We can check the value of that byte with [^0] and interpret a > > match as failure, but that adds negation and makes the test more > > complex. > > > > ^0*$ would falsely match if the 4097th byte (and possibly later > > ones) is 0. We need to make sure we check for end-of-line after > > the 4096th byte, not later. > > > > Sorry, Dscho, I thought we could take a shortcut here, but -- as > > you wrote all along -- we can't. > > > > So how about this? > > > > -- >8 -- > > Subject: [PATCH] t4062: use less than 256 repetitions in regex > > > > OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255. > > That's the minimum acceptable value according to POSIX. In t4062 we use > > 4096 repetitions in the test "-G matches", though, causing it to fail. > > Combine two repetition operators, both less than 256, to arrive at 4096 > > zeros instead of using a single one, to fix the test on OpenBSD. > > > > Original-patch-by: David Coppa <dcoppa@xxxxxxxxxxx> > > Signed-off-by: Rene Scharfe <l.s.r@xxxxxx> > > --- > > t/t4062-diff-pickaxe.sh | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/t/t4062-diff-pickaxe.sh b/t/t4062-diff-pickaxe.sh > > index 7c4903f497..1130c8019b 100755 > > --- a/t/t4062-diff-pickaxe.sh > > +++ b/t/t4062-diff-pickaxe.sh > > @@ -14,8 +14,10 @@ test_expect_success setup ' > > test_tick && > > git commit -m "A 4k file" > > ' > > + > > +# OpenBSD only supports up to 255 repetitions, so repeat twice for 64*64=4096. > > test_expect_success '-G matches' ' > > - git diff --name-only -G "^0{4096}$" HEAD^ >out && > > + git diff --name-only -G "^(0{64}){64}$" HEAD^ >out && > > test 4096-zeroes.txt = "$(cat out)" > > ' > > > > -- > > 2.14.0 > > I think this should work w/o problems on OpenBSD: > > $ uname -a > OpenBSD open.bsdgeek.it 6.1 GENERIC#54 amd64 > $ git diff --name-only -G "^0{4096}$" HEAD^ 1>/dev/null > fatal: invalid regex: invalid repetition count(s) > $ git diff --name-only -G "^(0{64}){64}$" HEAD^ 1>/dev/null > $ Perfect! Thank y'all for being so thorough, Dscho