> > Apart from adding a new field to struct diff_options, also define a new > > flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure. > > This is done because the xpparam_t structure (which is used for passing > > around the regular expression supplied to -I) is not zeroed out in all > > call stacks involving xdl_diff() and thus only performing a NULL check > > on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating > > garbage on the stack as a regular expression. As the 'flags' field of > > xpparam_t is initialized in all call stacks involving xdl_diff(), adding > > a flag check avoids that problem. > > You mentioned in your cover letter about this, and I tend to agree > that this feels like a hack, at least from the first glance. The > next feature that wants to have an optional pointer in xpparam_t and > have the code behave differently with the data pointed by it would > need to waste another bit the same way. Agreed. > Do you already know (read: > I am not asking you to dig to find out immediately---just asking if > you already know the answer) if there is an inherent reason why they > cannot be memset(0) before use? It seems like a much better > approach to ensure that we clear the structure. I do not know of any reason for xpparam_t structures to not be zero-initialized. In fact, they _are_ zero-initialized most of the time; AFAICT, there are just three places in the tree in which they are not. Would you like me to address that in a separate patch in this patch series? > Doesn't existing > anchors array share the same issue (at least anchors_nr must be > cleared if there is no interesting anchors) already? IOW, should > "git grep anchors_nr" be a valid way to identify _all_ places where > you need to clear the ignore_regex field? Yes, the 'anchors' and 'anchors_nr' fields of xpparam_t are also affected by the same problem, but the symptoms are more benign in their case because these fields are only used in xdiff/xpatience.c, i.e. when XDF_PATIENCE_DIFF is set in 'flags' - and as I already mentioned in commit messages, that field is always initialized properly, so there is currently no practical possibility of stack garbage being interpreted as e.g. a pointer to the anchor array. > > +-I<regex>:: > > + Ignore changes whose all lines match <regex>. > > + > > The implementation seems to allow only one regex, but I suspect we'd > want to mimic > > $ printf "%s\n" a a a >test_a > $ printf "%s\n" b b b >test_b > $ diff -Ia test_a test_b > $ diff -Ib test_a test_b > $ diff -Ia -Ib test_a test_b Ah, sure. After skimming through various man pages available online, I was not sure whether all standalone versions of diff which support -I allow that switch to be used multiple times and I thought it would be better to start with the simplest thing possible. If you would rather have the above implemented immediately, I can sure try that in v2. > and until that happens, the explanation needs to say something about > earlier <regex> being discarded when this option is given more than > once. Sorry, where do I look for that explanation? I tried to make -I behave similarly to -G and -S, each of which can be specified more than once, but the last value provided overrides the earlier ones - and I could not find any mention of that behavior in the docs. Please help? > > @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt, > > return 0; > > } > > > > +static int diff_opt_ignore_regex(const struct option *opt, > > + const char *arg, int unset) > > +{ > > + struct diff_options *options = opt->value; > > + > > + BUG_ON_OPT_NEG(unset); > > + options->xdl_opts |= XDF_IGNORE_REGEX; > > + options->ignore_regex = arg; > > When given twice or more, the earlier value gets lost (it does not > leak, luckily, though). Right, as I mentioned above, I just wanted to start with something simple that resembles -G/-S. I will revise this part in v2 if you would like to see support for multiple regular expressions in this patch series. > > + return 0; > > +} > > If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not > have to have this as a callback. Instead, it can be OPT_STRING with > the current semantics of "only the last one is used", or we can use > OPT_STRING_LIST to keep all the expressions. I see, thanks for the hints! > On the other hand, I wonder if it would be a valid approach to make > the new field(s) in diff_options a "regex_t *ignore_regex" (and add > an "int ignore_regex_nr" next to it, if we shoot for multiple -I > options), instead of "const char *". For that, we would need a > callback even without XDF_IGNORE_REGEX bit having to futz with > xdl_opts field. > > Doing so would give us a chance to compile and notice a malformed > regular expression in diff.c, before it hits xdiff/ layer, which may > or may not be a good thing. I have not thought about this approach before, but it sounds elegant to me, at least at first glance - option parsing code sounds like the right place for sanitizing user-provided parameters. Doubly so if we take the multiple -I options approach. I will try this in v2. > > @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags) > > } > > } > > I agree with what you said in the cover letter that it would be a > good idea to name the existing xdl_mark_ignorable() and the new one > in such a way that they look similar and parallel, by renaming the > former to xdl_mark_ignorable_lines or something. Then I will do that in v2. Separate patch? > > +static void > > +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe, > > + const char *ignore_regex) > > I know some coding standard do chomp line immediately before the > function name (I grew up with one), but that is not what this > project uses, and judging from the surrounding code, it is not what > the upstream xdiff source we borrowed uses, either. Oh, right, sorry - force of habit. I will fix this in v2. > > +{ > > + xdchange_t *xch; > > + regex_t regex; > > + > > + if (regcomp(®ex, ignore_regex, REG_EXTENDED | REG_NEWLINE)) > > + die("invalid regex: %s", ignore_regex); > > If we compile in diff.c layer and pass regex_t down here, we won't > have to fail here this deep in the callchain. Agreed - I already responded to this remark above. > > + for (xch = xscr; xch; xch = xch->next) { > > + regmatch_t regmatch; > > + xrecord_t **rec; > > + int ignore = 1; > > + long i; > > + > > + rec = &xe->xdf1.recs[xch->i1]; > > + for (i = 0; i < xch->chg1 && ignore; i++) > > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > > + 1, ®match, 0); > > + rec = &xe->xdf2.recs[xch->i2]; > > + for (i = 0; i < xch->chg2 && ignore; i++) > > + ignore = !regexec_buf(®ex, rec[i]->ptr, rec[i]->size, > > + 1, ®match, 0); > > + > > + /* > > + * Do not override --ignore-blank-lines. > > + */ > > + xch->ignore |= ignore; > > Well, you could optimize out the whole regexp matching by adding > > if (xch->ignore) > continue; > > before the two loops try to find a non-matching line, no? Ah, of course, it looks so obvious now that you pointed it out :-) I guess I was copy-past^W^W^Wtrying to make xdl_mark_ignorable_regex() look similar to xdl_mark_ignorable(). I will use your suggestion in v2. > > + } > > +} > > + > > int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > > xdemitconf_t const *xecfg, xdemitcb_t *ecb) { > > xdchange_t *xscr; > > @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp, > > if (xpp->flags & XDF_IGNORE_BLANK_LINES) > > xdl_mark_ignorable(xscr, &xe, xpp->flags); > > > > + if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex) > > + xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex); > > + > > if (ef(&xe, xscr, ecb, xecfg) < 0) { > > > > xdl_free_script(xscr); > > Thanks for an exciting read ;-) My pleasure ;-) And thanks for taking a look. Sorry about the long turnaround time, but unfortunately this is the best I can do at the moment. -- Best regards, Michał Kępień