Re: [PATCH v3] rebase --merge: optionally skip upstreamed commits

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/30/2020 12:06 AM, Jonathan Tan wrote:
> When rebasing against an upstream that has had many commits since the
> original branch was created:
> 
>  O -- O -- ... -- O -- O (upstream)
>   \
>    -- O (my-dev-branch)
> 
> it must read the contents of every novel upstream commit, in addition to
> the tip of the upstream and the merge base, because "git rebase"
> attempts to exclude commits that are duplicates of upstream ones. This
> can be a significant performance hit, especially in a partial clone,
> wherein a read of an object may end up being a fetch.
> 
> Add a flag to "git rebase" to allow suppression of this feature. This
> flag only works when using the "merge" backend.

So this is the behavior that already exists, and you are providing a way
to suppress it. However, you also change the default in this patch, which
may surprise users expecting this behavior to continue.

> This flag changes the behavior of sequencer_make_script(), called from
> do_interactive_rebase() <- run_rebase_interactive() <-
> run_specific_rebase() <- cmd_rebase(). With this flag, limit_list()
> (indirectly called from sequencer_make_script() through
> prepare_revision_walk()) will no longer call cherry_pick_list(), and
> thus PATCHSAME is no longer set. Refraining from setting PATCHSAME both
> means that the intermediate commits in upstream are no longer read (as
> shown by the test) and means that no PATCHSAME-caused skipping of
> commits is done by sequencer_make_script(), either directly or through
> make_script_with_merges().
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@xxxxxxxxxx>
> Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
> ---
> This commit contains Junio's sign-off because I based it on
> jt/rebase-allow-duplicate.
> 
> This does not include the fix by Đoàn Trần Công Danh. If we want all
> commits to pass all tests (whether run by Busybox or not) it seems like
> we should squash that patch instead of having it as a separate commit.
> If we do squash, maybe include a "Helped-by" with Đoàn Trần Công Danh's
> name.
> 
> Junio wrote [1]:
> 
>> Sounds much better to me.  I do not mind --[no-]keep-cherry-pick,
>> either, by the way.  I know I raised the possibility of having to
>> make it non-bool later, but since then I haven't thought of a good
>> third option myself anyway, so...
> 
> In that case, I think it's better to stick to bool. This also means that
> the change from the version in jt/rebase-allow-duplicate is very small,
> hopefully aiding reviewers - mostly a replacement of
> --skip-cherry-pick-detection with --keep-cherry-pick (which mean the
> same thing).
> 
> [1] https://lore.kernel.org/git/xmqq4kuakjcn.fsf@xxxxxxxxxxxxxxxxxxxxxx/
> ---
>  Documentation/git-rebase.txt | 21 +++++++++-
>  builtin/rebase.c             |  7 ++++
>  sequencer.c                  |  3 +-
>  sequencer.h                  |  2 +-
>  t/t3402-rebase-merge.sh      | 77 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 107 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> index 0c4f038dd6..f4f8afeb9a 100644
> --- a/Documentation/git-rebase.txt
> +++ b/Documentation/git-rebase.txt
> @@ -318,6 +318,21 @@ See also INCOMPATIBLE OPTIONS below.
>  +
>  See also INCOMPATIBLE OPTIONS below.
>  
> +--keep-cherry-pick::
> +--no-keep-cherry-pick::

I noticed that this _could_ have been simplified to

	--[no-]keep-cherry-pick::

but I also see several uses of either in our documentation. Do we
have a preference? By inspecting the lines before a "no-" string,
I see that some have these two lines, some use the [no-] pattern,
and others highlight the --no-<option> flag completely separately.

> +	Control rebase's behavior towards commits in the working
> +	branch that are already present upstream, i.e. cherry-picks.

I think the "already present upstream" is misleading. We don't rebase
things that are _reachable_ already, but this is probably better as

	Specify if rebase should include commits in the working branch
	that have diffs equivalent to other commits upstream. For example,
	a cherry-picked commit has an equivalent diff.

> ++
> +By default, these commits will be dropped. Because this necessitates
> +reading all upstream commits, this can be expensive in repos with a
> +large number of upstream commits that need to be read.

Now I'm confused. Are they dropped by default? Which option does what?
--keep-cherry-pick makes me think that cherry-picked commits will come
along for the rebase, so we will not check for them. But you have documented
that --no-keep-cherry-pick is the default.

(Also, I keep writing "--[no-]keep-cherry-picks" (plural) because that
seems more natural to me. Then I go back and fix it when I notice.)

> ++
> +If `--keep-cherry-pick is given`, all commits (including these) will be

Bad tick marks: "`--keep-cherry-pick` is given"

> +re-applied. This allows rebase to forgo reading all upstream commits,
> +potentially improving performance.

This reasoning is good. Could you also introduce a config option to make
--keep-cherry-pick the default? I would like to enable that option by
default in Scalar, but could also see partial clones wanting to enable that
by default, too.

> ++
> +See also INCOMPATIBLE OPTIONS below.
> +

Could we just say that his only applies with the --merge option? Why require
the jump to the end of the options section? (After writing this, I go look
at the rest of the doc file and see this is a common pattern.)

>  --rerere-autoupdate::
>  --no-rerere-autoupdate::
>  	Allow the rerere mechanism to update the index with the
> @@ -568,6 +583,9 @@ In addition, the following pairs of options are incompatible:
>   * --keep-base and --onto
>   * --keep-base and --root
>  
> +Also, the --keep-cherry-pick option requires the use of the merge backend
> +(e.g., through --merge).
> +

Will the command _fail_ if someone says --keep-cherry-pick without the merge
backend, or just have no effect? Also, specify the option with ticks and

	`--[no-]keep-cherry-pick`

It seems that none of the options in this section are back-ticked, which I think
is a doc bug.

>  BEHAVIORAL DIFFERENCES
>  -----------------------
>  
> @@ -866,7 +884,8 @@ Only works if the changes (patch IDs based on the diff contents) on
>  'subsystem' did.
>  
>  In that case, the fix is easy because 'git rebase' knows to skip
> -changes that are already present in the new upstream.  So if you say
> +changes that are already present in the new upstream (unless
> +`--keep-cherry-pick` is given). So if you say
>  (assuming you're on 'topic')
>  ------------
>      $ git rebase subsystem
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..626549b0b2 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -88,6 +88,7 @@ struct rebase_options {
>  	struct strbuf git_format_patch_opt;
>  	int reschedule_failed_exec;
>  	int use_legacy_rebase;
> +	int keep_cherry_pick;
>  };
>  
>  #define REBASE_OPTIONS_INIT {			  	\
> @@ -381,6 +382,7 @@ static int run_rebase_interactive(struct rebase_options *opts,
>  	flags |= opts->rebase_cousins > 0 ? TODO_LIST_REBASE_COUSINS : 0;
>  	flags |= opts->root_with_onto ? TODO_LIST_ROOT_WITH_ONTO : 0;
>  	flags |= command == ACTION_SHORTEN_OIDS ? TODO_LIST_SHORTEN_IDS : 0;
> +	flags |= opts->keep_cherry_pick ? TODO_LIST_KEEP_CHERRY_PICK : 0;

Since opts->keep_cherry_pick is initialized as zero, did you change the default
behavior? Do we not have a test that verifies this behavior when using the merge
backend an no "--keep-cherry-pick" option?

If you initialize it to -1, then you can tell if the --no-keep-cherry-pick option
is specified, which is relevant to my concern below.

>  
>  	switch (command) {
>  	case ACTION_NONE: {
> @@ -1515,6 +1517,8 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  		OPT_BOOL(0, "reschedule-failed-exec",
>  			 &reschedule_failed_exec,
>  			 N_("automatically re-schedule any `exec` that fails")),
> +		OPT_BOOL(0, "keep-cherry-pick", &options.keep_cherry_pick,
> +			 N_("apply all changes, even those already present upstream")),
>  		OPT_END(),
>  	};
>  	int i;
> @@ -1848,6 +1852,9 @@ int cmd_rebase(int argc, const char **argv, const char *prefix)
>  			      "interactive or merge options"));
>  	}
>  
> +	if (options.keep_cherry_pick && !is_interactive(&options))
> +		die(_("--keep-cherry-pick does not work with the 'apply' backend"));
> +

I see you are failing here. Is this the right decision?

The apply backend will "keep" cherry-picks because it will not look for them upstream.
If anything, shouldn't it be that "--no-keep-cherry-pick" is incompatible?

>  	if (options.signoff) {
>  		if (options.type == REBASE_PRESERVE_MERGES)
>  			die("cannot combine '--signoff' with "
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..7bbb63f444 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -4800,12 +4800,13 @@ int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  	int keep_empty = flags & TODO_LIST_KEEP_EMPTY;
>  	const char *insn = flags & TODO_LIST_ABBREVIATE_CMDS ? "p" : "pick";
>  	int rebase_merges = flags & TODO_LIST_REBASE_MERGES;
> +	int keep_cherry_pick = flags & TODO_LIST_KEEP_CHERRY_PICK;
>  
>  	repo_init_revisions(r, &revs, NULL);
>  	revs.verbose_header = 1;
>  	if (!rebase_merges)
>  		revs.max_parents = 1;
> -	revs.cherry_mark = 1;
> +	revs.cherry_mark = !keep_cherry_pick;
>  	revs.limited = 1;
>  	revs.reverse = 1;
>  	revs.right_only = 1;
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..298b7de1c8 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -148,7 +148,7 @@ int sequencer_remove_state(struct replay_opts *opts);
>   * `--onto`, we do not want to re-generate the root commits.
>   */
>  #define TODO_LIST_ROOT_WITH_ONTO (1U << 6)
> -
> +#define TODO_LIST_KEEP_CHERRY_PICK (1U << 7)
>  
>  int sequencer_make_script(struct repository *r, struct strbuf *out, int argc,
>  			  const char **argv, unsigned flags);
> diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
> index a1ec501a87..64200c5f20 100755
> --- a/t/t3402-rebase-merge.sh
> +++ b/t/t3402-rebase-merge.sh
> @@ -162,4 +162,81 @@ test_expect_success 'rebase --skip works with two conflicts in a row' '
>  	git rebase --skip
>  '
>  
> +test_expect_success '--keep-cherry-pick' '
> +	git init repo &&
> +
> +	# O(1-10) -- O(1-11) -- O(0-10) master
> +	#        \
> +	#         -- O(1-11) -- O(1-12) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >repo/file.txt &&
> +	git -C repo add file.txt &&
> +	git -C repo commit -m "base commit" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 0 10) >repo/file.txt &&
> +	git -C repo commit -a -m "add 0 delete 11" &&
> +
> +	git -C repo checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 1 11) >repo/file.txt &&
> +	git -C repo commit -a -m "add 11 in another branch" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >repo/file.txt &&
> +	git -C repo commit -a -m "add 12 in another branch" &&
> +
> +	# Regular rebase fails, because the 1-11 commit is deduplicated
> +	test_must_fail git -C repo rebase --merge master 2> err &&
> +	test_i18ngrep "error: could not apply.*add 12 in another branch" err &&
> +	git -C repo rebase --abort &&

OK. So here you are demonstrating that the --no-keep-cherry-pick is the
new default. Just trying to be sure that this was intended.

> +
> +	# With --keep-cherry-pick, it works
> +	git -C repo rebase --merge --keep-cherry-pick master
> +'
> +
> +test_expect_success '--keep-cherry-pick refrains from reading unneeded blobs' '
> +	git init server &&
> +
> +	# O(1-10) -- O(1-11) -- O(1-12) master
> +	#        \
> +	#         -- O(0-10) otherbranch
> +
> +	printf "Line %d\n" $(test_seq 1 10) >server/file.txt &&
> +	git -C server add file.txt &&
> +	git -C server commit -m "merge base" &&
> +
> +	printf "Line %d\n" $(test_seq 1 11) >server/file.txt &&
> +	git -C server commit -a -m "add 11" &&
> +
> +	printf "Line %d\n" $(test_seq 1 12) >server/file.txt &&
> +	git -C server commit -a -m "add 12" &&
> +
> +	git -C server checkout -b otherbranch HEAD^^ &&
> +	printf "Line %d\n" $(test_seq 0 10) >server/file.txt &&
> +	git -C server commit -a -m "add 0" &&
> +
> +	test_config -C server uploadpack.allowfilter 1 &&
> +	test_config -C server uploadpack.allowanysha1inwant 1 &&
> +
> +	git clone --filter=blob:none "file://$(pwd)/server" client &&
> +	git -C client checkout origin/master &&
> +	git -C client checkout origin/otherbranch &&
> +
> +	# Sanity check to ensure that the blobs from the merge base and "add
> +	# 11" are missing
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	MERGE_BASE_BLOB=$(git -C server rev-parse master^^:file.txt) &&
> +	ADD_11_BLOB=$(git -C server rev-parse master^:file.txt) &&
> +	grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list &&
> +
> +	git -C client rebase --merge --keep-cherry-pick origin/master &&
> +
> +	# The blob from the merge base had to be fetched, but not "add 11"
> +	git -C client rev-list --objects --all --missing=print >missing_list &&
> +	! grep "\\?$MERGE_BASE_BLOB" missing_list &&
> +	grep "\\?$ADD_11_BLOB" missing_list
> +'

I appreciate this test showing that this is accomplishing the goal in
a partial clone. 

Thanks,
-Stolee





[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux