Re: [PATCH] fast-export: fix surprising behavior with --first-parent

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/11/2021 14.07, Ævar Arnfjörð Bjarmason wrote:

On Tue, Nov 23 2021, William Sprent via GitGitGadget wrote:

From: William Sprent <williams@xxxxxxxxxxx>

When invoking git-fast-export with the --first-parent flag on a branch
with merges, fast-export would early-out on processing the first merge
on the branch. If combined with --reverse, fast-export would instead
output all single parent commits on the branch.

This commit makes fast-export output the same commits as rev-list
--first-parent, and makes --reverse not have an effect on which commits
are output.

The fix involves removing logic within fast-export which was responsible
for ensuring that parents are processed before their children, which was
what was exiting early due to missing second parents. This is replaced
by setting 'reverse = 1' before revision walking, which, in conjuction
with topo_order, allows for delegating the ordering of commits to
revision.c. The reverse flag is set after parsing rev-list arguments to
avoid having it disabled.

Signed-off-by: William Sprent <williams@xxxxxxxxxxx>
---
     fast-export: fix surprising behavior with --first-parent
Hi, This is my first time patching git, so I probably need some guidance on
     my approach. :)

Hi, thanks for your first contribution to git. This is a rather shallow
review, a deeper one is much deserved.
> I notice that you're removing code in builtin/fast-export.c, presumably
we have code in revision.c that does the same thing. It would really
help a reviewer for you to dig a bit into the relevant commit history
and note it in the commit message.

I.e. could revision.c always do this, and this was always needless
duplication, or at time X it was needed, but as of Y revision.c learned
to do this, and callers A, B and C were adjusted, but just not this
missed call D? etc.


That's a really good suggestion. I should've done that. I did dig just enough to see that the logic has been around since fast-export was introduced, but I didn't check whether the 'reverse' option was part of revision.c at that point. I see that Elijah has done that homework for me later in this thread and discovered that --reverse was introduce a year or so before fast-export though.

-static int has_unshown_parent(struct commit *commit)
-{
-	struct commit_list *parent;
-
-	for (parent = commit->parents; parent; parent = parent->next)
-		if (!(parent->item->object.flags & SHOWN) &&
-		    !(parent->item->object.flags & UNINTERESTING))
-			return 1;
-	return 0;
-}
-
  struct anonymized_entry {
  	struct hashmap_entry hash;
  	const char *anon;
@@ -752,20 +740,6 @@ static char *anonymize_tag(void *data)
  	return strbuf_detach(&out, NULL);
  }
-static void handle_tail(struct object_array *commits, struct rev_info *revs,
-			struct string_list *paths_of_changed_objects)
-{
-	struct commit *commit;
-	while (commits->nr) {
-		commit = (struct commit *)object_array_pop(commits);
-		if (has_unshown_parent(commit)) {
-			/* Queue again, to be handled later */
-			add_object_array(&commit->object, NULL, commits);
-			return;
-		}
-		handle_commit(commit, revs, paths_of_changed_objects);
-	}
-}

...

  static void handle_tag(const char *name, struct tag *tag)
  {
@@ -1185,7 +1159,6 @@ static int parse_opt_anonymize_map(const struct option *opt,
  int cmd_fast_export(int argc, const char **argv, const char *prefix)
  {
  	struct rev_info revs;
-	struct object_array commits = OBJECT_ARRAY_INIT;
  	struct commit *commit;
  	char *export_filename = NULL,
  	     *import_filename = NULL,
@@ -1281,19 +1254,14 @@ int cmd_fast_export(int argc, const char **argv, const char *prefix)
get_tags_and_duplicates(&revs.cmdline); + revs.reverse = 1;

Is the placement of revs.reverse = 1 here important, or could it go
earlier after init_revision_sources() when we assign some other values
ir revs?

  	if (prepare_revision_walk(&revs))
  		die("revision walk setup failed");

A light reading of prepare_revision_walk() suggests it could come after,
but maybe I'm entirely wrong.

It needs to go after the setup_revisions() call as otherwise a --reverse passed to fast-export will toggle the option off again. You are right in that it can be moved down. I've done that in my working directory for the next patch.

Another option would be to move the revs.reverse up as you suggest and then then call die() if it was toggled off by setup_revisions(). The only downside I can think of is that it would make any current usage of 'fast-export --reverse' go from a no-op to an error.

  	revs.diffopt.format_callback = show_filemodify;
  	revs.diffopt.format_callback_data = &paths_of_changed_objects;
  	revs.diffopt.flags.recursive = 1;
  	while ((commit = get_revision(&revs))) {
-		if (has_unshown_parent(commit)) {
-			add_object_array(&commit->object, NULL, &commits);
-		}
-		else {
-			handle_commit(commit, &revs, &paths_of_changed_objects);
-			handle_tail(&commits, &revs, &paths_of_changed_objects);
-		}
+		handle_commit(commit, &revs, &paths_of_changed_objects);
  	}

Yay code deletion, good if it works (I didn't check).

Since this is just a one-statement while-loop we can also remove its
braces now.
>> +test_expect_success 'fast-export --first-parent outputs all
revisions output by revision walk' '
+	git init first-parent &&
+	cd first-parent &&

Do any such "cd" in a sub-shell:

	git init x &&
	(
     		cd x &&
                 ...
	)
Otherwise the next test after you is going to run in anotherdirectory.

+	test_commit init &&
+	git checkout -b topic1 &&
+	test_commit file2 file2.txt &&
+	git checkout main &&
+	git merge topic1 --no-ff &&
+
+	git checkout -b topic2 &&
+	test_commit file3 file3.txt &&
+	git checkout main &&
+	git merge topic2 --no-ff &&

Just a nit. I'd use "test_commit A", "test_commit B" etc. when the
filenames etc. aren't important. There's no subsequent reference here,
so I assume they're not.

+	test_commit branch-head &&
+
+	git rev-list --format="%ad%B" --first-parent --topo-order --no-commit-header main > expected &&

nit; >expected, not > expected is the usual style.
+
+	git fast-export main -- --first-parent > first-parent-export &&
+	git fast-export main -- --first-parent --reverse > first-parent-reverse-export &&

ditto:

+	git init import && cd import &&

ditto earlier "cd" comment.

+	cat ../first-parent-export | git fast-import &&

Instead of "cat x | prog" do "prog <x".

+	git rev-list --format="%ad%B" --topo-order --all --no-commit-header > actual &&

+	test $(git rev-list --all | wc -l) -eq 4 &&

Instead:

     git rev-list --all >tmp &&
     test_line_count = 4 tmp

(for some value of tmp)

+	test_cmp ../expected actual &&
+	test_cmp ../first-parent-export ../first-parent-reverse-export
+'

Maybe some of the CD-ing around here wouldu be easier by not doing that
and instead running e.g.:

     git -C subdir fast-export >file-not-in-subdir &&
     ...


Thanks for taking the time to give your feedback. :) I'll remove those braces from the while loop and incorporate your comments about the test for v2.



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux