Re: [PATCH] t4062: stop using repetition in regex

René Scharfe <l.s.r@xxxxxx> · Wed, 9 Aug 2017 08:15:59 +0200

Am 09.08.2017 um 07:29 schrieb Junio C Hamano:
> René Scharfe <l.s.r@xxxxxx> writes:
> 
>> Am 09.08.2017 um 00:26 schrieb Junio C Hamano:
>>> ... but in the meantime, I think replacing the test with "0$" to
>>> force the scanner to find either the end of line or the end of the
>>> buffer may be a good workaround.  We do not have to care how many of
>>> random bytes are in front of the last "0" in order to ensure that
>>> the regexec_buf() does not overstep to 4097th byte, while seeing
>>> that regexec() that does not know how long the haystack is has to do
>>> so, no?
>>
>> Our regexec() calls strlen() (see my other reply).
>>
>> Using "0$" looks like the best option to me.
> 
> Yeah, it seems that way.  If we want to be close/faithful to the
> original, we could do "^0*$", but the part that is essential to
> trigger the old bug is not the "we have many zeroes" (or "we have
> 4096 zeroes") part, but "zero is at the end of the string" part, so
> "0$" would be the minimal pattern that also would work for OBSD.

Thought about it a bit more.

"^0{4096}$" checks if the byte after the buffer is \n or \0 in the
hope of triggering a segfault.  On Linux I can access that byte just
fine; perhaps there is no guard page.  Also there is a 2 in 256
chance of the byte being \n or \0 (provided its value is random),
which would cause the test to falsely report success.

"0$" effectively looks for "0\n" or "0\0", which can only occur
after the buffer.  If that string is found close enough then we
may not trigger a segfault and report a false positive.

In the face of unreliable segfaults we need to reverse our strategy,
I think.  Searching for something not in the buffer (e.g. "1") and
considering matches and segfaults as confirmation that the bug is
still present should avoid any false positives.  Right?

Thanks,
René