Re: [PATCH] grep: Don't pass a TODO test if REG_STARTEND is supported

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 16, 2010 at 19:50, Jonathan Nieder <jrnieder@xxxxxxxxx> wrote:
> Ævar Arnfjörð Bjarmason wrote:
>
>> Well to clarify: The TAP is arguably right, although semantically
>> these sort of tests should probably be a SKIP on unsupported
>> platforms, not a passing TODO.
>
> No, we support all platforms people are willing to fix without
> uglifying the code too much.  So a bug is a bug.  Test
> prerequisites get used for behavior that is either out of scope
> (Posix-style permissions on Windows) or hard to test (signal
> delivery to child process in t7502-commit).
>
> The semantic problem you are describing here is that we have no
> separate way to mark bugs that are not consistently reproducible.
> A “fixed” test_expect_failure is sometimes a fluke, like in this
> example.
>
> If lucky, you can find an appropriate condition and use
> test_expect_success or test_expect_failure as appropriate.  In
> the general case, that is not always easy.  Better to eliminate the
> unreproducible bugs.

The failure is totally predicated on whether or not REG_STARTEND is
available on the system, so in a perfect world (where we couldn't
provide a compat regex library) we should detect whether REG_STARTEND
is defined in regex.h, and then stick it in GIT-BUILD-OPTIONS.

Then you could do:

    test_expect_success REG_STARTEND 'git grep ile a' '
        git grep ile a
    '

But it's much easier to just fix the replacement regex library so the
test works everywhere instead of detecting for REG_STARTEND
(e.g. using a helper).

>> On Thu, Jul 15, 2010 at 18:44, Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote:
>
>>> We should also just upgrade the GNU regex library in compat/regex to
>>> the version that supports REG_STARTEND.
> [...]
>> This is what we should be focusing on
>
> By the way, I have no preference for choice of regex library here.  If
> something else is easier to get working correctly, that would be great.

The glibc one is probably pretty good as far as minimal POSIX DFA
engines go. Hopefully you can patch it up to get it to compile on
non-GNU systems.

More generally, the regex use in Git is something I've wanted to look
at more closely. Right now a bunch of Git tools use regexes in one
form or another, but they don't do so consistently:

    config.c:847:		  !regexec(store.value_regex, value, 0, NULL, 0)));
    config.c:1155:			if (regcomp(store.value_regex, value_regex,
    builtin/remote.c:1432:	if (regcomp(&old_regex, oldurl, REG_EXTENDED))
    builtin/remote.c:1436:		if (!regexec(&old_regex, urlset[i], 0, NULL, 0))
    builtin/blame.c:1963:	if (!(reg_error = regcomp(&regexp, spec + 1,
REG_NEWLINE)) &&
    builtin/blame.c:1964:	    !(reg_error = regexec(&regexp, line, 1,
match, 0))) {
    builtin/config.c:104:	if (use_key_regexp && regexec(key_regexp,
key_, 0, NULL, 0))
    builtin/config.c:107:	    (do_not_match ^ !!regexec(regexp,
(value_?value_:""), 0, NULL, 0)))
    builtin/config.c:177:		if (regcomp(key_regexp, key, REG_EXTENDED)) {
    builtin/config.c:190:		if (regcomp(regexp, regex_, REG_EXTENDED)) {
    builtin/apply.c:579:		if (regcomp(stamp, stamp_regexp, REG_EXTENDED)) {
    builtin/apply.c:586:	status = regexec(stamp, timestamp,
ARRAY_SIZE(m), m, 0);
    sha1_name.c:703:	if (regcomp(&regex, prefix, REG_EXTENDED))
    sha1_name.c:729:		if (!regexec(&regex, p + 2, 0, NULL, 0)) {
    diff.c:779:		if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
    diff.c:2011:				if (regcomp(ecbdata.diff_words->word_regex,
    xdiff-interface.c:276:		if (!regexec(&reg->re, line_buffer, 2, pmatch, 0)) {
    xdiff-interface.c:323:		if (regcomp(&reg->re, expression, cflags))
    diffcore-pickaxe.c:29:		while (*data && !regexec(regexp, data, 1,
&regmatch, flags)) {
    diffcore-pickaxe.c:62:		err = regcomp(&regex, needle, REG_EXTENDED
| REG_NEWLINE);
    grep.c:73:	err = regcomp(&p->regexp, p->pattern, opt->regflags);
    grep.c:375:	return regexec(preg, line, 1, match, eflags);
    http-backend.c:567:		if (regcomp(&re, c->pattern, REG_EXTENDED))
    http-backend.c:569:		if (!regexec(&re, dir, 1, out, 0)) {

Some of these are supplying REG_EXTENDED, some (like git-grep) allow
for passing REG_ICASE, some don't.

There's no way to do e.g. do a case insensitive git blame -L'/start/',
other than -L'/[sS][tT][aA][rR][tT]/' that is. It would be nice if we
could consistently pass regex flags, like -L'/start/i' in this
case. This also applies to features like the new ':/string' feature
added by Linus, maybe that should optionally be ':/string/i' (or
within the scope of the current implementation, ':!i/string' or
':!/string/i').

Regarding regular expression implementations. We might want to look
into bundling one implementation and using it everywhere, but I
haven't tested that to see if there are significant wins to be had.

There are regex implementations (notably GNU awk and GNU grep) that
are probably better fits for what Git does, a GNU grep backend for
git-log would probably search through revisions with a regex faster,
but I haven't tested it so I don't know if it's fast enough to bother
with it.

Using NFA engines like that also gives you some performance guarantees
(see [1][2]). Although that mostly matters in pathological
situations. But having Git make that guarantee might be useful,
e.g. so Gitweb can offer an interface to git-grep without having the
CPU on the host burned up by a pattern like (a*|a*)*.

The NFA engines we could use to get those features include Google's
RE2 (it's in C++, but we might fall back on e.g. the Plan9 engine),
GNU awk/grep, TRE, and probably some others.

But maybe this isn't something that's wanted or needed. Does anyone
else want a more unified regex interface in Git, or a determanistic
regex engine built-in?

1. http://swtch.com/~rsc/regexp/regexp1.html
2. http://stackoverflow.com/questions/1178173/regex-implementation-that-can-handle-machine-generated-regexs-non-backtracking
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]