On 25/02/17 02:38PM, Phillip Wood wrote: > Hi Justin > > On 12/02/2025 04:18, Justin Tobler wrote: > > Through git-diff(1), a single diff can be generated from a pair of blob > > revisions directly. Unfortunately, there is not a mechanism to compute > > batches of specific file pair diffs in a single process. Such a feature > > is particularly useful on the server-side where diffing between a large > > set of changes is not feasible all at once due to timeout concerns. > > > > To facilitate this, introduce git-diff-pairs(1) which takes the > > null-terminated raw diff format as input on stdin and produces diffs in > > other formats. As the raw diff format already contains the necessary > > metadata, it becomes possible to progressively generate batches of diffs > > without having to recompute rename detection or retrieve object context. > > Something like the following: > > > > git diff-tree -r -z -M $old $new | > > git diff-pairs -p > > > > should generate the same output as `git diff-tree -p -M`. Furthermore, > > each line of raw diff formatted input can also be individually fed to a > > separate git-diff-pairs(1) process and still produce the same output. > > I like the idea of this, I've left a few comments mainly around the UI. > > > +Here's an incomplete list of things that `diff-pairs` could do, but > > +doesn't (mostly in the name of simplicity): > > + > > + - Only `-z` input is accepted, not normal `--raw` input. > > I think only accepting NUL terminated input is fine, but if we want to > accept other formats we should have a plan for how to do that in a > backwards compatible way as we cannot use `-z` to distinguish between input > formats. If in the future we want to support the normal format, we could introduce an `--input-format=normal` option or something along those lines. > > + const char * const usage[] = { > > + N_("git diff-pairs [diff-options]"), > > Normally the option summary printed by "git foo -h" is generated by the > option parser. In this case we don't define any options and use > setup_revisions() instead so we need to provide the option summary > ourselves. Looking at diff-files.c we can add > > "\n" > COMMON_DIFF_OPTIONS_HELP; > > to do that. Would this be preferable even if git-diff-pairs doesn't support all of the common diff options? > > + argc = setup_revisions(argc, argv, &revs, NULL); > > I think we should check that there are no options left on the commandline > after setup_revisions() returns Good call, will do in the next version. > > + /* Don't allow pathspecs at all. */ > > + if (revs.prune_data.nr) > > + usage_with_options(usage, options); > > It is not just pathspecs that we want to reject but all revision related > options. Looking at diff-files.c we can do > > if (rev.pending.nr || > rev.min_age != -1 || rev.max_age != -1 || > rev.max_count != -1) > usage_with_option(usage, options); > > To catch some of that but it still accepts things like "--first-parent", > "--merges" and "--ancestry-path". We may just have to live with that as I > don't think it is worth expanding a huge amount of effort to prevent them. Yes, we should also reject revision as well as pathspec arguments. Will update. > > + if (!revs.diffopt.output_format) > > + revs.diffopt.output_format = DIFF_FORMAT_RAW; > > This matches the other diff plumbing commands but I'm not sure it is the > most helpful default for a command that is supposed to transform raw diffs > into another format. Maybe we should default to DIFF_FORMAT_PATCH? As you mentioned, defaulting to DIFF_FORMAT_RAW isn't the most useful behavior. I agree that it makes more sense to use DIFF_FORMAT_PATCH as the default. Will update in the next version. > > +test_expect_success 'split input across multiple diff-pairs' ' > > This needs a PERL prerequisite I think. I'm a bit unsure what this test adds > compared to the others. This test demonstrates that the raw diff input can be split across separate git-diff-pairs(1) processes and still produce equivilant output which is one of the main usecases for the command. That being said, this test isn't really exercising different behavior of git-diff-pairs(1) itself, so maybe it would be best to drop it. Thanks for the review :) -Justin