Since 79d3696cfb4 (git-grep: boolean expression on pattern matching., 2006-06-30) the "pattern_expression" member has been used for complex queries (AND/OR...), with "pattern_list" being used for the simple OR queries. Since then we've used both "pattern_expression" and its associated boolean "extended" member to see if we have a complex expression. Since f41fb662f57 (revisions API: have release_revisions() release "grep_filter", 2022-04-13) we've had a subtle bug relating to that: If we supplied options that were only used for "complex queries", but didn't supply the query itself we'd set "opt->extended", but would have a NULL "pattern_expression". As a result these would segfault as we tried to call "free_grep_patterns()" from "release_revisions()": git -P log -1 --invert-grep git -P log -1 --all-match The root cause of this is that we were conflating the state management we needed in "compile_grep_patterns()" itself with whether or not we had an "opt->pattern_expression" later on. In this cases as we're going through "compile_grep_patterns()" we have no "opt->pattern_list" but have "opt->no_body_match" or "opt->all_match". So we'd set "opt->extended = 1", but not "return" on "opt->extended" as that's an "else if" in the same "if" statement. That behavior is intentional and required, as the common case is that we have an "opt->pattern_list" that we're about to parse into the "opt->pattern_expression". But we don't need to keep track of this "extended" flag beyond the state management in compile_grep_patterns() itself. It needs it, but once we're out of that function we can rely on "opt->pattern_expression" being non-NULL instead for using these extended patterns. As 79d3696cfb4 itself shows we've assumed that there's a one-to-one mapping between the two since the very beginning. I.e. "match_line()" would check "opt->extended" to see if it should call "match_expr()", and the first thing we do in that function is assume that we have a "opt->pattern_expression". We'd then call "match_expr_eval()", which would have died if that "opt->pattern_expression" was NULL. The "die" was added in c922b01f54c (grep: fix segfault when "git grep '('" is given, 2009-04-27), and can now be removed as it's now clearly unreachable. We still do the right thing in the case that prompted that fix: git grep '(' fatal: unmatched parenthesis Arguably neither the "--invert-grep" option added in [1] nor the earlier "--all-match" option added in [2] were intended to be used stand-alone, and another approach[3] would be to error out in those cases. But since we've been treating them as a NOOP when given without --grep for a long time let's keep doing that. We could also return in "free_pattern_expr()" if the argument is non-NULL, as an alternative fix for this segfault does [4]. That would be more elegant in making the "free_*()" function behave like "free()", but it would also remove a sanity check: The "free_pattern_expr()" function calls itself recursively, and only the top-level is allowed to be NULL, let's not conflate those two conditions. 1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12) 2. 0ab7befa31d (grep --all-match, 2006-09-27) 3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@xxxxxxxxx/ 4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@xxxxxxxxxxxx Reported-by: orygaw <orygaw@xxxxxxxxxxxxxx> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> --- Per the v1 feedback this v2 doesn't change any user-observable behavior, except to address the segfault(s). Per https://lore.kernel.org/git/Y0Rg9My2EaWl%2FWCU@nand.local/ it sounds like Taylor's preparing his own v2 with a more narrow fix focused on https://lore.kernel.org/git/Y0Rg9My2EaWl%2FWCU@nand.local/; which we may or may not want to have instead of this as a quicker band-aid. But this attempts to address the root cause of the problem, which is that grep.[ch] is effectively juggling two struct members that mean the same thing, but whose state drifted apart. Passing CI at: https://github.com/avar/git/actions/runs/3225227595 (actually https://github.com/avar/git/actions/runs/3225813488, but when I was about to submit this I made a trivial whitespace fix in the t/*.sh change). Range-diff against v1: 1: f4b90799fce ! 1: 6ad7627706f log: require --grep for --invert-grep and --all-match, fix segfault @@ Metadata Author: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> ## Commit message ## - log: require --grep for --invert-grep and --all-match, fix segfault + grep.c: remove "extended" in favor of "pattern_expression", fix segfault - Neither the "--invert-grep" option added in [1] nor the earlier - "--all-match" option added in [2] were intended to be used - stand-alone. + Since 79d3696cfb4 (git-grep: boolean expression on pattern matching., + 2006-06-30) the "pattern_expression" member has been used for complex + queries (AND/OR...), with "pattern_list" being used for the simple OR + queries. Since then we've used both "pattern_expression" and its + associated boolean "extended" member to see if we have a complex + expression. - But due to how the built-in and the revision API interacted those - options without the corresponding --grep would be ignored. + Since f41fb662f57 (revisions API: have release_revisions() release + "grep_filter", 2022-04-13) we've had a subtle bug relating to that: If + we supplied options that were only used for "complex queries", but + didn't supply the query itself we'd set "opt->extended", but would + have a NULL "pattern_expression". As a result these would segfault as + we tried to call "free_grep_patterns()" from "release_revisions()": - Then in f41fb662f57 (revisions API: have release_revisions() release - "grep_filter", 2022-04-13) this turned into a segfault, as we'd - attempt to free() the non-existing --grep patterns. + git -P log -1 --invert-grep + git -P log -1 --all-match - Arguably it makes more sense to add this check to - compile_grep_patterns(), since it's possible to use the C API in the - same way and trigger this segfault. But in practice the revision.c API - is the only user of "no_body_match", and by placing the check here we - can more sensibly emit a message that assumes that the user used - "--invert-grep" without "--grep". + The root cause of this is that we were conflating the state management + we needed in "compile_grep_patterns()" itself with whether or not we + had an "opt->pattern_expression" later on. + + In this cases as we're going through "compile_grep_patterns()" we have + no "opt->pattern_list" but have "opt->no_body_match" or + "opt->all_match". So we'd set "opt->extended = 1", but not "return" on + "opt->extended" as that's an "else if" in the same "if" statement. + + That behavior is intentional and required, as the common case is that + we have an "opt->pattern_list" that we're about to parse into the + "opt->pattern_expression". + + But we don't need to keep track of this "extended" flag beyond the + state management in compile_grep_patterns() itself. It needs it, but + once we're out of that function we can rely on + "opt->pattern_expression" being non-NULL instead for using these + extended patterns. + + As 79d3696cfb4 itself shows we've assumed that there's a one-to-one + mapping between the two since the very beginning. I.e. "match_line()" + would check "opt->extended" to see if it should call "match_expr()", + and the first thing we do in that function is assume that we have a + "opt->pattern_expression". We'd then call "match_expr_eval()", which + would have died if that "opt->pattern_expression" was NULL. + + The "die" was added in c922b01f54c (grep: fix segfault when "git grep + '('" is given, 2009-04-27), and can now be removed as it's now clearly + unreachable. We still do the right thing in the case that prompted + that fix: + + git grep '(' + fatal: unmatched parenthesis + + Arguably neither the "--invert-grep" option added in [1] nor the + earlier "--all-match" option added in [2] were intended to be used + stand-alone, and another approach[3] would be to error out in those + cases. But since we've been treating them as a NOOP when given without + --grep for a long time let's keep doing that. + + We could also return in "free_pattern_expr()" if the argument is + non-NULL, as an alternative fix for this segfault does [4]. That would + be more elegant in making the "free_*()" function behave like + "free()", but it would also remove a sanity check: The + "free_pattern_expr()" function calls itself recursively, and only the + top-level is allowed to be NULL, let's not conflate those two + conditions. 1. 22dfa8a23de (log: teach --invert-grep option, 2015-01-12) 2. 0ab7befa31d (grep --all-match, 2006-09-27) + 3. https://lore.kernel.org/git/patch-1.1-f4b90799fce-20221010T165711Z-avarab@xxxxxxxxx/ + 4. http://lore.kernel.org/git/7e094882c2a71894416089f894557a9eae07e8f8.1665423686.git.me@xxxxxxxxxxxx Reported-by: orygaw <orygaw@xxxxxxxxxxxxxx> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> - ## revision.c ## -@@ revision.c: int setup_revisions(int argc, const char **argv, struct rev_info *revs, struct s - die(_("options '%s' and '%s' cannot be used together"), "--no-walk", "--graph"); - if (!revs->reflog_info && revs->grep_filter.use_reflog_filter) - die(_("the option '%s' requires '%s'"), "--grep-reflog", "--walk-reflogs"); -+ if (!revs->grep_filter.pattern_expression) { -+ if (revs->grep_filter.no_body_match) -+ die(_("the option '%s' requires '%s'"), "--invert-grep", "--grep"); -+ if (revs->grep_filter.all_match) -+ die(_("the option '%s' requires '%s'"), "--all-match", "--grep"); -+ } + ## grep.c ## +@@ grep.c: void compile_grep_patterns(struct grep_opt *opt) + { + struct grep_pat *p; + struct grep_expr *header_expr = prep_header_patterns(opt); ++ int extended = 0; + + for (p = opt->pattern_list; p; p = p->next) { + switch (p->token) { +@@ grep.c: void compile_grep_patterns(struct grep_opt *opt) + compile_regexp(p, opt); + break; + default: +- opt->extended = 1; ++ extended = 1; + break; + } + } + + if (opt->all_match || opt->no_body_match || header_expr) +- opt->extended = 1; +- else if (!opt->extended) ++ extended = 1; ++ else if (!extended) + return; + + p = opt->pattern_list; +@@ grep.c: void free_grep_patterns(struct grep_opt *opt) + free(p); + } - if (revs->line_level_traverse && - (revs->diffopt.output_format & ~(DIFF_FORMAT_PATCH | DIFF_FORMAT_NO_OUTPUT))) +- if (!opt->extended) ++ if (!opt->pattern_expression) + return; + free_pattern_expr(opt->pattern_expression); + } +@@ grep.c: static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, + { + int h = 0; + +- if (!x) +- die("Not a valid grep expression"); + switch (x->node) { + case GREP_NODE_TRUE: + h = 1; +@@ grep.c: static int match_line(struct grep_opt *opt, + struct grep_pat *p; + int hit = 0; + +- if (opt->extended) ++ if (opt->pattern_expression) + return match_expr(opt, bol, eol, ctx, col, icol, + collect_hits); + +@@ grep.c: static int should_lookahead(struct grep_opt *opt) + { + struct grep_pat *p; + +- if (opt->extended) ++ if (opt->pattern_expression) + return 0; /* punt for too complex stuff */ + if (opt->invert) + return 0; + + ## grep.h ## +@@ grep.h: struct grep_opt { + #define GREP_BINARY_TEXT 2 + int binary; + int allow_textconv; +- int extended; + int use_reflog_filter; + int relative; + int pathname; ## t/t4202-log.sh ## @@ t/t4202-log.sh: test_expect_success 'log --grep' ' test_cmp expect actual ' -+test_expect_success 'log --invert-grep usage' ' -+ test_expect_code 128 git log --invert-grep -+' ++for noop_opt in --invert-grep --all-match ++do ++ test_expect_success "log $noop_opt without --grep is a NOOP" ' ++ git log >expect && ++ git log $noop_opt >actual && ++ test_cmp expect actual ++ ' ++done + cat > expect << EOF second initial - - ## t/t7810-grep.sh ## -@@ t/t7810-grep.sh: test_expect_success 'log with multiple --grep uses union' ' - test_cmp expect actual - ' - -+test_expect_success 'log --all-match usage' ' -+ test_expect_code 128 git log --all-match -+' -+ - test_expect_success 'log --all-match with multiple --grep uses intersection' ' - git log --all-match --grep=i --grep=r --format=%s >actual && - { grep.c | 15 +++++++-------- grep.h | 1 - t/t4202-log.sh | 9 +++++++++ 3 files changed, 16 insertions(+), 9 deletions(-) diff --git a/grep.c b/grep.c index 52a894c9890..06eed694936 100644 --- a/grep.c +++ b/grep.c @@ -708,6 +708,7 @@ void compile_grep_patterns(struct grep_opt *opt) { struct grep_pat *p; struct grep_expr *header_expr = prep_header_patterns(opt); + int extended = 0; for (p = opt->pattern_list; p; p = p->next) { switch (p->token) { @@ -717,14 +718,14 @@ void compile_grep_patterns(struct grep_opt *opt) compile_regexp(p, opt); break; default: - opt->extended = 1; + extended = 1; break; } } if (opt->all_match || opt->no_body_match || header_expr) - opt->extended = 1; - else if (!opt->extended) + extended = 1; + else if (!extended) return; p = opt->pattern_list; @@ -790,7 +791,7 @@ void free_grep_patterns(struct grep_opt *opt) free(p); } - if (!opt->extended) + if (!opt->pattern_expression) return; free_pattern_expr(opt->pattern_expression); } @@ -971,8 +972,6 @@ static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, { int h = 0; - if (!x) - die("Not a valid grep expression"); switch (x->node) { case GREP_NODE_TRUE: h = 1; @@ -1052,7 +1051,7 @@ static int match_line(struct grep_opt *opt, struct grep_pat *p; int hit = 0; - if (opt->extended) + if (opt->pattern_expression) return match_expr(opt, bol, eol, ctx, col, icol, collect_hits); @@ -1370,7 +1369,7 @@ static int should_lookahead(struct grep_opt *opt) { struct grep_pat *p; - if (opt->extended) + if (opt->pattern_expression) return 0; /* punt for too complex stuff */ if (opt->invert) return 0; diff --git a/grep.h b/grep.h index bdcadce61b8..6075f997e68 100644 --- a/grep.h +++ b/grep.h @@ -151,7 +151,6 @@ struct grep_opt { #define GREP_BINARY_TEXT 2 int binary; int allow_textconv; - int extended; int use_reflog_filter; int relative; int pathname; diff --git a/t/t4202-log.sh b/t/t4202-log.sh index cc15cb4ff62..2ce2b41174d 100755 --- a/t/t4202-log.sh +++ b/t/t4202-log.sh @@ -249,6 +249,15 @@ test_expect_success 'log --grep' ' test_cmp expect actual ' +for noop_opt in --invert-grep --all-match +do + test_expect_success "log $noop_opt without --grep is a NOOP" ' + git log >expect && + git log $noop_opt >actual && + test_cmp expect actual + ' +done + cat > expect << EOF second initial -- 2.38.0.971.ge79ff6d20e7