Re: BUG: git grep behave oddly with alternatives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 11.01.23 um 19:56 schrieb Jeff King:
> On Sun, Jan 08, 2023 at 01:42:04AM +0100, René Scharfe wrote:
>
>> REG_ENHANCED on macOS affects REG_EXTENDED as well.  It unlocks e.g.
>> non-greedy repetitions and inline comments.  Sounds nice, but also
>> potentially surprising.  I was unable to find a current version of
>> the re_format(7) manpage online, unfortunately.
>
> I'm not quite sure what you mean here by "non-greedy repetitions".
> Something like:
>
>   # prefer "foo bar" to "foo bar bar"; only matters for colorizing or
>   # --only-matching
>   git grep -E 'foo.*?bar'
>
> ? If so, then yeah, that changes the meaning of a bare "?" and people
> might be surprised by it.

Right.  To be fair, question mark is a special character and you'd
probably need to quote it anyway if you want to match a literal
question mark.  Otherwise I get:

   $ git grep -E 'foo.*?bar'
   fatal: command line, 'foo.*?bar': repetition-operator operand invalid

>> Apple's latest version of Git sets NO_REGEX and thus uses
>> compat/regex, if I read their source correctly:
>>
>> https://github.com/apple-oss-distributions/Git/blob/Git-128/src/git/Makefile#L1302
>>
>> The easiest and most consistent option would be to do the same.  But
>> we can't do that, because it would break multibyte support, which was
>> fixed by 1819ad327b (grep: fix multibyte regex handling under macOS,
>> 2022-08-26), which started to use the system regex functions again.
>
> Looks like that NO_REGEX line was dropped by 1819ad327b. So I don't
> think Apple added it themselves; their version is just based on an older
> version of Git (looks like 2.24.3).

Makes sense.

>> Which begs the question: Isn't that a problem for the platforms that
>> still have to use NO_REGEX?  Shouldn't we fix compat/regex?
>
> I'm not sure. I always assumed our fallback was similar-ish to what was
> in glibc and was thus pretty featureful, but that may not be true (it
> actually comes from gawk). It looks like we just didn't pull over the
> multi-byte parts in a997bf423d (compat/regex: get the gawk regex engine
> to compile within git, 2010-08-17).

GAWK removed NO_MBSUPPORT, NO_MBSUPPORT and mbsupport.h in the meantime.
I guess that means they support multi-byte characters everywhere now.

René




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux