Michał Kępień <michal@xxxxxxx> writes: > Apart from adding a new field to struct diff_options, also define a new > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure. > This is done because the xpparam_t structure (which is used for passing > around the regular expression supplied to -I) is not zeroed out in all > call stacks involving xdl_diff() and thus only performing a NULL check > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating > garbage on the stack as a regular expression. As the 'flags' field of > xpparam_t is initialized in all call stacks involving xdl_diff(), adding > a flag check avoids that problem. You mentioned in your cover letter about this, and I tend to agree that this feels like a hack, at least from the first glance. The next feature that wants to have an optional pointer in xpparam_t and have the code behave differently with the data pointed by it would need to waste another bit the same way. Do you already know (read: I am not asking you to dig to find out immediately---just asking if you already know the answer) if there is an inherent reason why they cannot be memset(0) before use? It seems like a much better approach to ensure that we clear the structure. Doesn't existing anchors array share the same issue (at least anchors_nr must be cleared if there is no interesting anchors) already? IOW, should "git grep anchors_nr" be a valid way to identify _all_ places where you need to clear the ignore_regex field? > +-I<regex>:: > + Ignore changes whose all lines match <regex>. > + The implementation seems to allow only one regex, but I suspect we'd want to mimic $ printf "%s\n" a a a >test_a $ printf "%s\n" b b b >test_b $ diff -Ia test_a test_b $ diff -Ib test_a test_b $ diff -Ia -Ib test_a test_b and until that happens, the explanation needs to say something about earlier <regex> being discarded when this option is given more than once. > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt, > return 0; > } > > +static int diff_opt_ignore_regex(const struct option *opt, > + const char *arg, int unset) > +{ > + struct diff_options *options = opt->value; > + > + BUG_ON_OPT_NEG(unset); > + options->xdl_opts |= XDF_IGNORE_REGEX; > + options->ignore_regex = arg; When given twice or more, the earlier value gets lost (it does not leak, luckily, though). > + return 0; > +} If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not have to have this as a callback. Instead, it can be OPT_STRING with the current semantics of "only the last one is used", or we can use OPT_STRING_LIST to keep all the expressions. On the other hand, I wonder if it would be a valid approach to make the new field(s) in diff_options a "regex_t *ignore_regex" (and add an "int ignore_regex_nr" next to it, if we shoot for multiple -I options), instead of "const char *". For that, we would need a callback even without XDF_IGNORE_REGEX bit having to futz with xdl_opts field. Doing so would give us a chance to compile and notice a malformed regular expression in diff.c, before it hits xdiff/ layer, which may or may not be a good thing. > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > } > } I agree with what you said in the cover letter that it would be a good idea to name the existing xdl_mark_ignorable() and the new one in such a way that they look similar and parallel, by renaming the former to xdl_mark_ignorable_lines or something. > +static void > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > + const char *ignore_regex) I know some coding standard do chomp line immediately before the function name (I grew up with one), but that is not what this project uses, and judging from the surrounding code, it is not what the upstream xdiff source we borrowed uses, either. > +{ > + xdchange_t *xch; > + regex_t regex; > + > + if (regcomp(®ex, ignore_regex, REG_EXTENDED | REG_NEWLINE)) > + die("invalid regex: %s", ignore_regex); If we compile in diff.c layer and pass regex_t down here, we won't have to fail here this deep in the callchain. > + for (xch = xscr; xch; xch = xch->next) { > + regmatch_t regmatch; > + xrecord_t **rec; > + int ignore = 1; > + long i; > + > + rec = &xe->xdf1.recs[xch->i1]; > + for (i = 0; i < xch->chg1 && ignore; i++) > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > + 1, ®match, 0); > + rec = &xe->xdf2.recs[xch->i2]; > + for (i = 0; i < xch->chg2 && ignore; i++) > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > + 1, ®match, 0); > + > + /* > + * Do not override --ignore-blank-lines. > + */ > + xch->ignore |= ignore; Well, you could optimize out the whole regexp matching by adding if (xch->ignore) continue; before the two loops try to find a non-matching line, no? > + } > +} > + > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > xdemitconf_t const *xecfg, xdemitcb_t *ecb) { > xdchange_t *xscr; > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > if (xpp->flags & XDF_IGNORE_BLANK_LINES) > xdl_mark_ignorable(xscr, &xe, xpp->flags); > > + if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex) > + xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex); > + > if (ef(&xe, xscr, ecb, xecfg) < 0) { > > xdl_free_script(xscr); Thanks for an exciting read ;-)