Hi René, > On 20.10.2024, at 13:02, René Scharfe <l.s.r@xxxxxx> wrote: > > regexec(3) can fail. E.g. on macOS it fails if it is used with an UTF-8 > locale to match a valid regex against a buffer containing invalid UTF-8 > characters. > > git grep has two ways to search for matches in a file: Either it splits > its contents into lines and matches them separately, or it matches the > whole content and figures out line boundaries later. The latter is done > by look_ahead() and it's quicker in the common case where most files > don't contain a match. > > Fall back to line-by-line matching if look_ahead() encounters an > regexec(3) error by propagating errors out of patmatch() and bailing out > of look_ahead() if there is one. This way we at least can find matches > in lines that contain only valid characters. That matches the behavior > of grep(1) on macOS. > > pcre2match() dies if pcre2_jit_match() or pcre2_match() fail, but since > we use the flag PCRE2_MATCH_INVALID_UTF it handles invalid UTF-8 > characters gracefully. So implement the fall-back only for regexec(3) > and leave the PCRE2 matching unchanged. > > Reported-by: David Gstir <david@xxxxxxxxxxxxx> > Signed-off-by: René Scharfe <l.s.r@xxxxxx> thanks for fixing this! I’ve tested it on my end and your patch works. Feel free to add my Tested-By. Thanks, David