On Tue, Jul 23, 2019 at 5:47 AM Johannes Schindelin <Johannes.Schindelin@xxxxxx> wrote: > > So when PCRE2 complains about the top two bits not being 0x80, it fails > to parse the bytes correctly (byte 2 is 0xbb, whose two top bits are > indeed 0x80). the error is confusing but it is not coming from the pattern, but from what PCRE2 calls the subject. meaning that while going through the repository it found content that it tried to match but that it is not valid UTF-8, like all the png and a few txt files that are not encoded as UTF-8 (ex: t/t3900/ISO8859-1.txt). > Maybe this is a bug in your PCRE2 version? Mine is 10.33... and this > does not happen here... But then, I don't need the `-I` option, and my > output looks like this: -I was just an attempt to workaround the obvious binary files (like PNG); I'll assume you should be able to reproduce if using a non JIT enabled PCRE2, regardless of version. my point was that unlike in your report, I didn't have any test cases failing, because AFAIK there are no test cases using broken UTF-8 (the ones with binary data are actually valid zero terminated UTF-8 strings) Carlo