Re: [PATCH v2 2/3] builtin: introduce diff-pairs command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Justin

On 12/02/2025 04:18, Justin Tobler wrote:
Through git-diff(1), a single diff can be generated from a pair of blob
revisions directly. Unfortunately, there is not a mechanism to compute
batches of specific file pair diffs in a single process. Such a feature
is particularly useful on the server-side where diffing between a large
set of changes is not feasible all at once due to timeout concerns.

To facilitate this, introduce git-diff-pairs(1) which takes the
null-terminated raw diff format as input on stdin and produces diffs in
other formats. As the raw diff format already contains the necessary
metadata, it becomes possible to progressively generate batches of diffs
without having to recompute rename detection or retrieve object context.
Something like the following:

	git diff-tree -r -z -M $old $new |
	git diff-pairs -p

should generate the same output as `git diff-tree -p -M`. Furthermore,
each line of raw diff formatted input can also be individually fed to a
separate git-diff-pairs(1) process and still produce the same output.

I like the idea of this, I've left a few comments mainly around the UI.

> +Here's an incomplete list of things that `diff-pairs` could do, but
> +doesn't (mostly in the name of simplicity):
> +
> + - Only `-z` input is accepted, not normal `--raw` input.

I think only accepting NUL terminated input is fine, but if we want to accept other formats we should have a plan for how to do that in a backwards compatible way as we cannot use `-z` to distinguish between input formats.

+	const char * const usage[] = {
+		N_("git diff-pairs [diff-options]"),

Normally the option summary printed by "git foo -h" is generated by the option parser. In this case we don't define any options and use setup_revisions() instead so we need to provide the option summary ourselves. Looking at diff-files.c we can add

	"\n"
	COMMON_DIFF_OPTIONS_HELP;

to do that.

+	argc = setup_revisions(argc, argv, &revs, NULL);

I think we should check that there are no options left on the commandline after setup_revisions() returns

+	/* Don't allow pathspecs at all. */
+	if (revs.prune_data.nr)
+		usage_with_options(usage, options);

It is not just pathspecs that we want to reject but all revision related options. Looking at diff-files.c we can do

	if (rev.pending.nr ||
	    rev.min_age != -1 || rev.max_age != -1 ||
	    rev.max_count != -1)
		usage_with_option(usage, options);

To catch some of that but it still accepts things like "--first-parent", "--merges" and "--ancestry-path". We may just have to live with that as I don't think it is worth expanding a huge amount of effort to prevent them.

+	if (!revs.diffopt.output_format)
+		revs.diffopt.output_format = DIFF_FORMAT_RAW;

This matches the other diff plumbing commands but I'm not sure it is the most helpful default for a command that is supposed to transform raw diffs into another format. Maybe we should default to DIFF_FORMAT_PATCH?


+test_expect_success 'split input across multiple diff-pairs' '

This needs a PERL prerequisite I think. I'm a bit unsure what this test adds compared to the others.

Best Wishes

Phillip

+	write_script split-raw-diff "$PERL_PATH" <<-\EOF &&
+	$/ = "\0";
+	while (<>) {
+	  my $meta = $_;
+	  my $path = <>;
+	  # renames have an extra path
+	  my $path2 = <> if $meta =~ /[RC]\d+/;
+
+	  open(my $fh, ">", sprintf "diff%03d", $.);
+	  print $fh $meta, $path, $path2;
+	}
+	EOF
+
+	git diff-tree -p -M -C -C base new >expect &&
+
+	git diff-tree -r -z -M -C -C base new |
+	./split-raw-diff &&
+	for i in diff*; do
+		git diff-pairs -p <$i || return 1
+	done >actual &&
+	test_cmp expect actual
+'
+
+test_done





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux