Re: [PATCH 14/25] pickaxe -S: remove redundant "sz" check in while-loop

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 04 2021, Junio C Hamano wrote:

> René Scharfe <l.s.r@xxxxxx> writes:
>
>>> -		while (sz &&
>>> -		       !regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>>> +		while (!regexec_buf(regexp, data, sz, 1, &regmatch, flags)) {
>>
>> This will loop forever for regexes that match an empty string.  An
>> example would be /$/.  Silly, perhaps, but still I understand this check
>> less as an optimization and more as a correctness/robustness thing.
>>
>>>  			flags |= REG_NOTBOL;
>>>  			data += regmatch.rm_eo;
>>>  			sz -= regmatch.rm_eo;
>>> -			if (sz && regmatch.rm_so == regmatch.rm_eo) {
>>> +			if (regmatch.rm_so == regmatch.rm_eo) {
>>>  				data++;
>>>  				sz--;
>>>  			}
>>
>> Before, if the match was an empty string and there was more data after
>> it, then the code would consume a character anyway, in order to avoid
>> matching the same empty string again.  With the patch, that character
>> is consumed even if there is no more data.  This leaves 'data'
>> pointing beyond the buffer and 'sz' rolls over to ULONG_MAX.  Oops. :(
>
> While I do not care too much about NUL in the haystack, I do not
> mind [13/25] either.  But this is bad.
>
> This whole thing reminds me of f53c5de2 (pickaxe: fix segfault with
> '-S<...> --pickaxe-regex', 2017-03-18), by the way.

René: Well spotted, thanks, and Oops.

I've just sent a separate series with 01-10 of this one. I'm sitting on
the diffcore-pickaxe patches for a while. I've got local fixes for a lot
of issues in it, will fix this one too.

I've optimized the PCRE v2 codepath a lot more in my local
version. Current results are:

    4209.1: git log -S'int main' <limit-rev>..                                0.38(0.36+0.01)   0.37(0.33+0.04) -2.6%
    4209.2: git log -S'æ' <limit-rev>..                                       0.51(0.47+0.04)   0.32(0.27+0.05) -37.3%
    4209.3: git log --pickaxe-regex -S'(int|void|null)' <limit-rev>..         0.72(0.68+0.03)   0.57(0.54+0.03) -20.8%
    4209.4: git log --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..          0.60(0.55+0.02)   0.39(0.34+0.05) -35.0%
    4209.5: git log --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..       0.43(0.40+0.03)   0.50(0.44+0.06) +16.3%
    4209.6: git log -G'(int|void|null)' <limit-rev>..                         0.64(0.55+0.09)   0.63(0.56+0.05) -1.6%
    4209.7: git log -G'if *\([^ ]+ & ' <limit-rev>..                          0.64(0.59+0.05)   0.63(0.56+0.06) -1.6%
    4209.8: git log -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                       0.63(0.54+0.08)   0.62(0.55+0.06) -1.6%
    4209.9: git log -i -S'int main' <limit-rev>..                             0.39(0.35+0.03)   0.38(0.35+0.02) -2.6%
    4209.10: git log -i -S'æ' <limit-rev>..                                   0.39(0.33+0.06)   0.32(0.28+0.04) -17.9%
    4209.11: git log -i --pickaxe-regex -S'(int|void|null)' <limit-rev>..     0.90(0.84+0.05)   0.58(0.53+0.04) -35.6%
    4209.12: git log -i --pickaxe-regex -S'if *\([^ ]+ & ' <limit-rev>..      0.71(0.64+0.06)   0.40(0.37+0.03) -43.7%
    4209.13: git log -i --pickaxe-regex -S'[àáâãäåæñøùúûüýþ]' <limit-rev>..   0.43(0.40+0.03)   0.50(0.46+0.04) +16.3%
    4209.14: git log -i -G'(int|void|null)' <limit-rev>..                     0.64(0.57+0.06)   0.62(0.56+0.05) -3.1%
    4209.15: git log -i -G'if *\([^ ]+ & ' <limit-rev>..                      0.65(0.59+0.06)   0.63(0.54+0.08) -3.1%
    4209.16: git log -i -G'[àáâãäåæñøùúûüýþ]' <limit-rev>..                   0.63(0.55+0.08)   0.62(0.56+0.05) -1.6%

The main optimization was just moving the compilation of the pattern up
the stack into the diff_options struct, the current version in this
thread re-compiles it every time.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux