Junio C Hamano <gitster@xxxxxxxxx> writes: > static const struct funcname_pattern_entry builtin_funcname_pattern[] = { > { "bibtex", "(@[a-zA-Z]{1,}[ \t]*\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$", > REG_EXTENDED }, > { "html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$", REG_EXTENDED }, > { "java", > "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n" > "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$", > REG_EXTENDED }, > { "pascal", > "^((procedure|function|constructor|destructor|interface|" > "implementation|initialization|finalization)[ \t]*.*)$" > "|" > "^(.*=[ \t]*(class|record).*)$", > REG_EXTENDED }, > { "php", "^[\t ]*((function|class).*)", REG_EXTENDED }, > { "python", "^[ \t]*((class|def)[ \t].*)$", REG_EXTENDED }, > { "ruby", "^[ \t]*((class|module|def)[ \t].*)$", > REG_EXTENDED }, > { "tex", > "^(\\\\((sub)*section|chapter|part)\\*{0,1}\{.*)$", > REG_EXTENDED }, > }; On the "Objective-C hunk header" topic, I mentioned that I wondered how a construct like: "( A | B ) | ( C | D )" could possibly work, given that xdiff-interface.c::ff_regexp() uses a regmatch_t array with only two elements to capture (which means it can only capture $0 and $1). It turns out that the function is not really prepared to do this properly. In order to cope with a pattern without _any_ capturing parentheses in the regexp, it has a trick to use $0 if $1 is undefined. In other words, if you write: "junk ( A | B ) | garbage ( C | D )" then lines "junk A", "junk B", "garbage C" and "garbage D" would all match and be on the hunk header line. However, "garbage" is not stripped out (while "junk" does get left out of the capture). This happens not to be a problem with any of the above built-in patterns. The risky one is "pascal", but it's $2 captures the entire line anyway, so it does not make any difference if the code used $0 instead. E.g. a line that matches the second alternative, "foo = class bar", will be captured as $0 and this is the same as (un)captured $2. But it would be a good idea to fix this while our attention to the issue is still fresh. As we discussed earlier, we can replace the top-level "|" with "\n" and fix the semantics of multiple positive regexp to grab $1 (or $0 if $1 is unavaialble) of the first positive one. If we did so, when the pattern "junk ( A | B ) | garbage ( C | D )" matches a line "garbage C", the initial garbage part will not be included inside the capture. Here is such a proposed fix. -- >8 -- diff: fix "multiple regexp" semantics to find hunk header comment When multiple regular expressions are concatenated with "\n", they were traditionally AND'ed together, and only a line that matches _all_ of them were taken as a match. This however is unwieldy when multiple regexp feature is used to specify alternatives. This fixes the semantics to take the first match. A nagative pattern, if matches, makes the line to fail. A positive pattern, if matches, will be the final match and what it captures in $1 is used as the hunk header comment. We could write alternatives using "|" in ERE, but the machinery can only use captured $1 as the hunk header comment (or $0 if there is no match in $1), so you cannot write: "junk ( A | B ) | garbage ( C | D )" and expect both "junk" and "garbage" to get stripped with the existing code. With this fix, you can write it as: "junk ( A | B ) \n garbage ( C | D )" and the way capture works would match the user expectation more naturally. Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> --- diff.c | 2 +- xdiff-interface.c | 17 ++++++++++------- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git c/diff.c w/diff.c index a733010..1bcbbd5 100644 --- c/diff.c +++ w/diff.c @@ -1414,7 +1414,7 @@ static const struct funcname_pattern_entry builtin_funcname_pattern[] = { { "pascal", "^((procedure|function|constructor|destructor|interface|" "implementation|initialization|finalization)[ \t]*.*)$" - "|" + "\n" "^(.*=[ \t]*(class|record).*)$", REG_EXTENDED }, { "php", "^[\t ]*((function|class).*)", REG_EXTENDED }, diff --git c/xdiff-interface.c w/xdiff-interface.c index 7f1a7d3..6c6bb19 100644 --- c/xdiff-interface.c +++ w/xdiff-interface.c @@ -194,26 +194,29 @@ static long ff_regexp(const char *line, long len, char *line_buffer = xstrndup(line, len); /* make NUL terminated */ struct ff_regs *regs = priv; regmatch_t pmatch[2]; - int result = 0, i; + int i; + int result = -1; for (i = 0; i < regs->nr; i++) { struct ff_reg *reg = regs->array + i; - if (reg->negate ^ !!regexec(®->re, - line_buffer, 2, pmatch, 0)) { - free(line_buffer); - return -1; + if (!regexec(®->re, line_buffer, 2, pmatch, 0)) { + if (reg->negate) + goto fail; + break; } } + if (regs->nr <= i) + goto fail; i = pmatch[1].rm_so >= 0 ? 1 : 0; line += pmatch[i].rm_so; result = pmatch[i].rm_eo - pmatch[i].rm_so; if (result > buffer_size) result = buffer_size; else - while (result > 0 && (isspace(line[result - 1]) || - line[result - 1] == '\n')) + while (result > 0 && (isspace(line[result - 1]))) result--; memcpy(buffer, line, result); + fail: free(line_buffer); return result; } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html