Re: [PATCH v2 2/3] builtin: introduce diff-pairs command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/02/17 02:38PM, Phillip Wood wrote:
> Hi Justin
> 
> On 12/02/2025 04:18, Justin Tobler wrote:
> > Through git-diff(1), a single diff can be generated from a pair of blob
> > revisions directly. Unfortunately, there is not a mechanism to compute
> > batches of specific file pair diffs in a single process. Such a feature
> > is particularly useful on the server-side where diffing between a large
> > set of changes is not feasible all at once due to timeout concerns.
> > 
> > To facilitate this, introduce git-diff-pairs(1) which takes the
> > null-terminated raw diff format as input on stdin and produces diffs in
> > other formats. As the raw diff format already contains the necessary
> > metadata, it becomes possible to progressively generate batches of diffs
> > without having to recompute rename detection or retrieve object context.
> > Something like the following:
> > 
> > 	git diff-tree -r -z -M $old $new |
> > 	git diff-pairs -p
> > 
> > should generate the same output as `git diff-tree -p -M`. Furthermore,
> > each line of raw diff formatted input can also be individually fed to a
> > separate git-diff-pairs(1) process and still produce the same output.
> 
> I like the idea of this, I've left a few comments mainly around the UI.
> 
> > +Here's an incomplete list of things that `diff-pairs` could do, but
> > +doesn't (mostly in the name of simplicity):
> > +
> > + - Only `-z` input is accepted, not normal `--raw` input.
> 
> I think only accepting NUL terminated input is fine, but if we want to
> accept other formats we should  have a plan for how to do that in a
> backwards compatible way as we cannot use `-z` to distinguish between input
> formats.

If in the future we want to support the normal format, we could introduce
an `--input-format=normal` option or something along those lines. 

> > +	const char * const usage[] = {
> > +		N_("git diff-pairs [diff-options]"),
> 
> Normally the option summary printed by "git foo -h" is generated by the
> option parser. In this case we don't define any options and use
> setup_revisions() instead so we need to provide the option summary
> ourselves. Looking at diff-files.c we can add
> 
> 	"\n"
> 	COMMON_DIFF_OPTIONS_HELP;
> 
> to do that.

Would this be preferable even if git-diff-pairs doesn't support all of
the common diff options?

> > +	argc = setup_revisions(argc, argv, &revs, NULL);
> 
> I think we should check that there are no options left on the commandline
> after setup_revisions() returns

Good call, will do in the next version.

> > +	/* Don't allow pathspecs at all. */
> > +	if (revs.prune_data.nr)
> > +		usage_with_options(usage, options);
> 
> It is not just pathspecs that we want to reject but all revision related
> options. Looking at diff-files.c we can do
> 
> 	if (rev.pending.nr ||
> 	    rev.min_age != -1 || rev.max_age != -1 ||
> 	    rev.max_count != -1)
> 		usage_with_option(usage, options);
> 
> To catch some of that but it still accepts things like "--first-parent",
> "--merges" and "--ancestry-path". We may just have to live with that as I
> don't think it is worth expanding a huge amount of effort to prevent them.

Yes, we should also reject revision as well as pathspec arguments. Will
update.

> > +	if (!revs.diffopt.output_format)
> > +		revs.diffopt.output_format = DIFF_FORMAT_RAW;
> 
> This matches the other diff plumbing commands but I'm not sure it is the
> most helpful default for a command that is supposed to transform raw diffs
> into another format. Maybe we should default to DIFF_FORMAT_PATCH?

As you mentioned, defaulting to DIFF_FORMAT_RAW isn't the most useful
behavior. I agree that it makes more sense to use DIFF_FORMAT_PATCH as
the default. Will update in the next version.

> > +test_expect_success 'split input across multiple diff-pairs' '
> 
> This needs a PERL prerequisite I think. I'm a bit unsure what this test adds
> compared to the others.

This test demonstrates that the raw diff input can be split across
separate git-diff-pairs(1) processes and still produce equivilant
output which is one of the main usecases for the command. That being
said, this test isn't really exercising different behavior of
git-diff-pairs(1) itself, so maybe it would be best to drop it.

Thanks for the review :)

-Justin




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux