Re: git grep does not find all occurrences on macOS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 15.10.24 um 02:26 schrieb Taylor Blau:
> On Mon, Oct 14, 2024 at 03:34:02PM +0200, David Gstir wrote:
>> Hi!
>>
>> I encountered rather subtle issue on in git 2.47.0 on macOS 14.7 (installed from Homebrew):
>>
>> git grep will not find all occurrences of string patterns containing a “.” under some
>> conditions. In my case I have an ISO-8859 encoded text file which contains umlauts.
>> If the string I’m grepping for occurs after a non-ASCII character in this file, git grep
>> will not find it.
>>
>> I’ve put up a reproducer here https://github.com/iokill/repro-git-grep-issue, but the gist
>> of it is "git grep quz.baz" on the ISO-8859-encoded file below will not return anything,
>> when it should return the line "quz.baz=3":

Can reproduce on macOS 15.0.1.  Bisects to 1819ad327b (grep: fix
multibyte regex handling under macOS, 2022-08-26), which enabled the use
of the system's regex engine.  grep(1) does find that line.

regexec(3) returns REG_ILLSEQ (illegal byte sequence) for that file,
which makes sense.  Interpreting that result as a non-match of the
whole file is not the best way to handle it, though.  Reporting the
error would be one option.  Turning off lookahead and matching each line
separately might be better.

Would setting the attribute working-tree-encoding help here?  Not fully:
The file would be converted to UTF-8 before commit, but git grep without
a tree argument would still read the raw file, without conversion.
Shouldn't it respect the attribute and call convert_to_git()?

Using -P to use Perl regular expressions would work in the example.

René






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux