Re: [PATCH 1/2] diff: add -I<regex> that ignores matching changes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Michał Kępień <michal@xxxxxxx> writes:

> Apart from adding a new field to struct diff_options, also define a new
> flag, XDF_IGNORE_REGEX, for the 'xdl_opts' field of that structure.
> This is done because the xpparam_t structure (which is used for passing
> around the regular expression supplied to -I) is not zeroed out in all
> call stacks involving xdl_diff() and thus only performing a NULL check
> on xpp->ignore_regex could result in xdl_mark_ignorable_regex() treating
> garbage on the stack as a regular expression.  As the 'flags' field of
> xpparam_t is initialized in all call stacks involving xdl_diff(), adding
> a flag check avoids that problem.

You mentioned in your cover letter about this, and I tend to agree
that this feels like a hack, at least from the first glance.  The
next feature that wants to have an optional pointer in xpparam_t and
have the code behave differently with the data pointed by it would
need to waste another bit the same way.  Do you already know (read:
I am not asking you to dig to find out immediately---just asking if
you already know the answer) if there is an inherent reason why they
cannot be memset(0) before use?  It seems like a much better
approach to ensure that we clear the structure.  Doesn't existing
anchors array share the same issue (at least anchors_nr must be
cleared if there is no interesting anchors) already?  IOW, should
"git grep anchors_nr" be a valid way to identify _all_ places where
you need to clear the ignore_regex field?

> +-I<regex>::
> +	Ignore changes whose all lines match <regex>.
> +

The implementation seems to allow only one regex, but I suspect we'd
want to mimic

    $ printf "%s\n" a a a >test_a
    $ printf "%s\n" b b b >test_b
    $ diff -Ia     test_a test_b
    $ diff     -Ib test_a test_b
    $ diff -Ia -Ib test_a test_b

and until that happens, the explanation needs to say something about
earlier <regex> being discarded when this option is given more than
once.

> @@ -5203,6 +5205,17 @@ static int diff_opt_patience(const struct option *opt,
>  	return 0;
>  }
>  
> +static int diff_opt_ignore_regex(const struct option *opt,
> +				 const char *arg, int unset)
> +{
> +	struct diff_options *options = opt->value;
> +
> +	BUG_ON_OPT_NEG(unset);
> +	options->xdl_opts |= XDF_IGNORE_REGEX;
> +	options->ignore_regex = arg;

When given twice or more, the earlier value gets lost (it does not
leak, luckily, though).

> +	return 0;
> +}

If we somehow can lose the use of XDF_IGNORE_REGEX bit, we do not
have to have this as a callback.  Instead, it can be OPT_STRING with
the current semantics of "only the last one is used", or we can use
OPT_STRING_LIST to keep all the expressions.

On the other hand, I wonder if it would be a valid approach to make
the new field(s) in diff_options a "regex_t *ignore_regex" (and add
an "int ignore_regex_nr" next to it, if we shoot for multiple -I
options), instead of "const char *".  For that, we would need a
callback even without XDF_IGNORE_REGEX bit having to futz with
xdl_opts field.

Doing so would give us a chance to compile and notice a malformed
regular expression in diff.c, before it hits xdiff/ layer, which may
or may not be a good thing.

> @@ -1019,6 +1019,39 @@ static void xdl_mark_ignorable(xdchange_t *xscr, xdfenv_t *xe, long flags)
>  	}
>  }

I agree with what you said in the cover letter that it would be a
good idea to name the existing xdl_mark_ignorable() and the new one
in such a way that they look similar and parallel, by renaming the
former to xdl_mark_ignorable_lines or something.

> +static void
> +xdl_mark_ignorable_regex(xdchange_t *xscr, const xdfenv_t *xe,
> +			 const char *ignore_regex)

I know some coding standard do chomp line immediately before the
function name (I grew up with one), but that is not what this
project uses, and judging from the surrounding code, it is not what
the upstream xdiff source we borrowed uses, either.

> +{
> +	xdchange_t *xch;
> +	regex_t regex;
> +
> +	if (regcomp(&regex, ignore_regex, REG_EXTENDED | REG_NEWLINE))
> +		die("invalid regex: %s", ignore_regex);

If we compile in diff.c layer and pass regex_t down here, we won't
have to fail here this deep in the callchain.

> +	for (xch = xscr; xch; xch = xch->next) {
> +		regmatch_t regmatch;
> +		xrecord_t **rec;
> +		int ignore = 1;
> +		long i;
> +
> +		rec = &xe->xdf1.recs[xch->i1];
> +		for (i = 0; i < xch->chg1 && ignore; i++)
> +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> +					      1, &regmatch, 0);
> +		rec = &xe->xdf2.recs[xch->i2];
> +		for (i = 0; i < xch->chg2 && ignore; i++)
> +			ignore = !regexec_buf(&regex, rec[i]->ptr, rec[i]->size,
> +					      1, &regmatch, 0);
> +
> +		/*
> +		 * Do not override --ignore-blank-lines.
> +		 */
> +		xch->ignore |= ignore;

Well, you could optimize out the whole regexp matching by adding

		if (xch->ignore)
			continue;

before the two loops try to find a non-matching line, no?

> +	}
> +}
> +
>  int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  	     xdemitconf_t const *xecfg, xdemitcb_t *ecb) {
>  	xdchange_t *xscr;
> @@ -1040,6 +1073,9 @@ int xdl_diff(mmfile_t *mf1, mmfile_t *mf2, xpparam_t const *xpp,
>  		if (xpp->flags & XDF_IGNORE_BLANK_LINES)
>  			xdl_mark_ignorable(xscr, &xe, xpp->flags);
>  
> +		if ((xpp->flags & XDF_IGNORE_REGEX) && xpp->ignore_regex)
> +			xdl_mark_ignorable_regex(xscr, &xe, xpp->ignore_regex);
> +
>  		if (ef(&xe, xscr, ecb, xecfg) < 0) {
>  
>  			xdl_free_script(xscr);

Thanks for an exciting read ;-)





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux