On Wed, Feb 1, 2023 at 6:03 PM Jeff King <peff@xxxxxxxx> wrote: > So the regex engine is complaining that it is getting bytes with high > bits set, but that are not part of a multi-byte character. I.e., it is > not happy to do bytewise matching, but really wants valid UTF8 in the > expression. I did manage to find that the call to regcomp in diff.c's init_diff_words_data (line 2212 in v2.39.1) is what crashes; I could not step into it with gdb, however. Further, the following C program compiles without warnings (except for the unused main parameters): ``` #include <regex.h> #include <assert.h> #include <stddef.h> #include <stdio.h> int main(int argc, char **argv) { regex_t re; int ret = regcomp(&re, "[\xc0-\xff][\x80-\xbf]+", REG_EXTENDED | REG_NEWLINE); /* assert(ret != 0); */ size_t errbuf_size = regerror(ret, &re, NULL, 0); char errbuf[errbuf_size]; regerror(ret, &re, errbuf, errbuf_size); printf("%s\n", errbuf); } ``` ``` # CFLAGS='-Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Wshadow -Wpointer-arith -Wcast-qual -pedantic -std=c11' # cc $CFLAGS regtest.c -o regtest && ./regtest *** unknown regexp error code *** ``` (the assertion fails because regcomp succeeds!) So I can neither find out what's to blame nor what to fix. Here are the linked libraries on macOS (IIUC): ``` # otool -L regtest regtest: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0) # otool -L ./git-diff # from v2.39.1 source build today ./git-diff: /System/Library/Frameworks/CoreServices.framework/Versions/A/CoreServices (compatibility version 1.0.0, current version 1141.1.0) /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.11) /usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0) /usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 12.0.0, current version 12.0.0) /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0) /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation (compatibility version 150.0.0, current version 1856.105.0) ``` -- D. Ben Knoble