On Fri, Jul 16, 2010 at 19:50, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote: > Ævar Arnfjörð Bjarmason wrote: > >> Well to clarify: The TAP is arguably right, although semantically >> these sort of tests should probably be a SKIP on unsupported >> platforms, not a passing TODO. > > No, we support all platforms people are willing to fix without > uglifying the code too much. So a bug is a bug. Test > prerequisites get used for behavior that is either out of scope > (Posix-style permissions on Windows) or hard to test (signal > delivery to child process in t7502-commit). > > The semantic problem you are describing here is that we have no > separate way to mark bugs that are not consistently reproducible. > A “fixed” test_expect_failure is sometimes a fluke, like in this > example. > > If lucky, you can find an appropriate condition and use > test_expect_success or test_expect_failure as appropriate. In > the general case, that is not always easy. Better to eliminate the > unreproducible bugs. The failure is totally predicated on whether or not REG_STARTEND is available on the system, so in a perfect world (where we couldn't provide a compat regex library) we should detect whether REG_STARTEND is defined in regex.h, and then stick it in GIT-BUILD-OPTIONS. Then you could do: test_expect_success REG_STARTEND 'git grep ile a' ' git grep ile a ' But it's much easier to just fix the replacement regex library so the test works everywhere instead of detecting for REG_STARTEND (e.g. using a helper). >> On Thu, Jul 15, 2010 at 18:44, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > >>> We should also just upgrade the GNU regex library in compat/regex to >>> the version that supports REG_STARTEND. > [...] >> This is what we should be focusing on > > By the way, I have no preference for choice of regex library here. If > something else is easier to get working correctly, that would be great. The glibc one is probably pretty good as far as minimal POSIX DFA engines go. Hopefully you can patch it up to get it to compile on non-GNU systems. More generally, the regex use in Git is something I've wanted to look at more closely. Right now a bunch of Git tools use regexes in one form or another, but they don't do so consistently: config.c:847: !regexec(store.value_regex, value, 0, NULL, 0))); config.c:1155: if (regcomp(store.value_regex, value_regex, builtin/remote.c:1432: if (regcomp(&old_regex, oldurl, REG_EXTENDED)) builtin/remote.c:1436: if (!regexec(&old_regex, urlset[i], 0, NULL, 0)) builtin/blame.c:1963: if (!(reg_error = regcomp(®exp, spec + 1, REG_NEWLINE)) && builtin/blame.c:1964: !(reg_error = regexec(®exp, line, 1, match, 0))) { builtin/config.c:104: if (use_key_regexp && regexec(key_regexp, key_, 0, NULL, 0)) builtin/config.c:107: (do_not_match ^ !!regexec(regexp, (value_?value_:""), 0, NULL, 0))) builtin/config.c:177: if (regcomp(key_regexp, key, REG_EXTENDED)) { builtin/config.c:190: if (regcomp(regexp, regex_, REG_EXTENDED)) { builtin/apply.c:579: if (regcomp(stamp, stamp_regexp, REG_EXTENDED)) { builtin/apply.c:586: status = regexec(stamp, timestamp, ARRAY_SIZE(m), m, 0); sha1_name.c:703: if (regcomp(®ex, prefix, REG_EXTENDED)) sha1_name.c:729: if (!regexec(®ex, p + 2, 0, NULL, 0)) { diff.c:779: if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) { diff.c:2011: if (regcomp(ecbdata.diff_words->word_regex, xdiff-interface.c:276: if (!regexec(®->re, line_buffer, 2, pmatch, 0)) { xdiff-interface.c:323: if (regcomp(®->re, expression, cflags)) diffcore-pickaxe.c:29: while (*data && !regexec(regexp, data, 1, ®match, flags)) { diffcore-pickaxe.c:62: err = regcomp(®ex, needle, REG_EXTENDED | REG_NEWLINE); grep.c:73: err = regcomp(&p->regexp, p->pattern, opt->regflags); grep.c:375: return regexec(preg, line, 1, match, eflags); http-backend.c:567: if (regcomp(&re, c->pattern, REG_EXTENDED)) http-backend.c:569: if (!regexec(&re, dir, 1, out, 0)) { Some of these are supplying REG_EXTENDED, some (like git-grep) allow for passing REG_ICASE, some don't. There's no way to do e.g. do a case insensitive git blame -L'/start/', other than -L'/[sS][tT][aA][rR][tT]/' that is. It would be nice if we could consistently pass regex flags, like -L'/start/i' in this case. This also applies to features like the new ':/string' feature added by Linus, maybe that should optionally be ':/string/i' (or within the scope of the current implementation, ':!i/string' or ':!/string/i'). Regarding regular expression implementations. We might want to look into bundling one implementation and using it everywhere, but I haven't tested that to see if there are significant wins to be had. There are regex implementations (notably GNU awk and GNU grep) that are probably better fits for what Git does, a GNU grep backend for git-log would probably search through revisions with a regex faster, but I haven't tested it so I don't know if it's fast enough to bother with it. Using NFA engines like that also gives you some performance guarantees (see [1][2]). Although that mostly matters in pathological situations. But having Git make that guarantee might be useful, e.g. so Gitweb can offer an interface to git-grep without having the CPU on the host burned up by a pattern like (a*|a*)*. The NFA engines we could use to get those features include Google's RE2 (it's in C++, but we might fall back on e.g. the Plan9 engine), GNU awk/grep, TRE, and probably some others. But maybe this isn't something that's wanted or needed. Does anyone else want a more unified regex interface in Git, or a determanistic regex engine built-in? 1. http://swtch.com/~rsc/regexp/regexp1.html 2. http://stackoverflow.com/questions/1178173/regex-implementation-that-can-handle-machine-generated-regexs-non-backtracking -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html