Am 09.08.2017 um 00:09 schrieb Junio C Hamano: > René Scharfe <l.s.r@xxxxxx> writes: > >> Am 08.08.2017 um 16:49 schrieb Johannes Schindelin: >>> Hi René, >>> >>> On Tue, 8 Aug 2017, René Scharfe wrote: >>> >>>> OpenBSD's regex library has a repetition limit (RE_DUP_MAX) of 255. >>>> That's the minimum acceptable value according to POSIX. In t4062 we use >>>> 4096 repetitions in the test "-G matches", though, causing it to fail. >>>> >>>> Do the same as the test "-S --pickaxe-regex" in the same file and search >>>> for a single zero instead. That still suffices to trigger the buffer >>>> overrun in older versions (checked with b7d36ffca02^ and --valgrind on >>>> Linux), simplifies the test a bit, and avoids exceeding OpenBSD's limit. >>> >>> I am afraid not. The 4096 is precisely the page size required to trigger >>> the bug on Windows against which this regression test tries to safeguard. >> >> Checked with b7d36ffca02^ on MinGW now as well and found that it >> segfaults with the proposed change ten out of ten times. > > That is a strange result but I can believe it. > > The reason why I find it strange is that the test wants to run > diff_grep() in diffcore-pickaxe.c with one == NULL (because we are > looking at difference between an initial empty commit and the > current commit that adds 4096-zeroes.txt file), which makes the > current blob (i.e. a page of '0' that may be mmap(2)ed without > trailing NUL to terminate it) scanned via regexec() to look for the > search string. > > I can understand why Dscho originally did "^0{4096}$"; it is to > ensure that the whole page is scanned for 4096 zeroes and then the > code would somehow make sure that there is no more byte until the > end of line, which will force regexec (but not regexec_buf that knows > where the buffer ends) to look at the 4097th byte that does not exist. > > If you change the pattern to just "0" that is not anchored, I'd expect > regexec() that does not know how big the haystack is to just find "0" > at the first byte and happily return without segfaulting (because it > does not even have to scan the remainder of the buffer). > > So I find Dscho's concern quite valid, even though I do believe you > when you say the code somehow segfaults. I just can not tell > how/why it would segfault, though---it is possible that regexec() > implementation is stupid and does not realize that it can return early > reporting success without looking at the rest of the buffer, but > somehow I find it unlikely. > > Puzzled. Good point. Valgrind reports: ==57466== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==57466== Access not within mapped region at address 0x4027000 ==57466== at 0x4C2EDF4: strlen (vg_replace_strmem.c:458) ==57466== by 0x59D9F76: regexec@@GLIBC_2.3.4 (regexec.c:240) ==57466== by 0x54D96E: diff_grep (diffcore-pickaxe.c:0) ==57466== by 0x54DAC3: pickaxe_match (diffcore-pickaxe.c:149) And you can see in our version in compat/regex/regexec.c:241 that the first thing regexec() does is calling strlen(). So to avoid depending on that implementation detail we'd need to use a search string that won't be found (e.g. "1") or with unlimited repetition (e.g. "0*"), right? René