Re: [PATCH v1] checkout: optionally speed up "git checkout -b foo"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7/24/2018 3:21 PM, Junio C Hamano wrote:
Ben Peart <Ben.Peart@xxxxxxxxxxxxx> writes:

From: Ben Peart <Ben.Peart@xxxxxxxxxxxxx>

If the new core.optimizecheckout config setting is set to true, speed up
"git checkout -b foo" by avoiding the work to merge the working tree.  This
is valid because no merge needs to occur - only creating the new branch/
updating the refs. Any other options force it through the old code path.

This change in behavior is off by default and behind the config setting so
that users have to opt-in to the optimized behavior.




We've been running with this patch internally for a long time but it was
rejected when I submitted it to the mailing list before because it
implicitly changes the behavior of checkout -b. Trying it again configured
behind a config setting as a potential solution for other optimizations to
checkout that could change the behavior as well.

https://public-inbox.org/git/20180724042740.GB13248@xxxxxxxxxxxxxxxxxxxxx/T/#m75afe3ab318d23f36334cf3a6e3d058839592469

An incorrect link?  It does not look like a thread that explains
what was previously submitted but failed.  The last paragraph looks
like a fine material below the three-dash line.


See my earlier reply about this section in:

https://public-inbox.org/git/xmqqh8koxwwi.fsf@xxxxxxxxxxxxxxxxxxxxxxxxx/T/#mb31136a09dbc1a963a5a62e840b118ac33043edf


Signed-off-by: Ben Peart <Ben.Peart@xxxxxxxxxxxxx>
---

Notes:
     Base Ref: master
     Web-Diff: https://github.com/benpeart/git/commit/f43d934ce7
     Checkout: git fetch https://github.com/benpeart/git checkout-b-v1 && git checkout f43d934ce7

  Documentation/config.txt |  6 +++
  builtin/checkout.c       | 94 ++++++++++++++++++++++++++++++++++++++++
  cache.h                  |  1 +
  config.c                 |  5 +++
  environment.c            |  1 +
  5 files changed, 107 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a32172a43c..2c4f513bf1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -911,6 +911,12 @@ core.commitGraph::
  	Enable git commit graph feature. Allows reading from the
  	commit-graph file.
+core.optimizedCheckout
+	Speed up "git checkout -b foo" by skipping much of the work of a
+	full checkout command.  This changs the behavior as it will skip
+	merging the trees and updating the index and instead only create
+	and switch to the new ref.

By the way, why is it a core.* thing, not checkout.* thing?


I followed the naming convention used by core.sparsecheckout but I'm happy to call it whatever people want.

If a new feature is not necessarily recommendable for normal users
and it needs to be hidden behind an opt-in knob (I do not have a
strong opinion if that is or is not the case for this particular
feature at this point), the documentation for the knob should give a
bit more than "This chang(e)s the behavior" to the readers, I would
think, to be intellectually honest ;-).  Let's tell them what bad
things happen if we pretend that we switched the branch without
twoway merge and the index update to help them make an informed
decision.


I attempted to explain what the change in behavior was in the same sentence by saying what it skips and what it still does. If that isn't intellectually honest, help me know how to better explain it so that it is.

Is this better?

Speed up "git checkout -b <new_branch>" by skipping the twoway merge and index update. Instead, just create a new branch named <new_branch> and switch to it. The working directory and index are left unchanged.

+static int needs_working_tree_merge(const struct checkout_opts *opts,
+	const struct branch_info *old_branch_info,
+	const struct branch_info *new_branch_info)
+{
+	/*
+	 * We must do the merge if we are actually moving to a new
+	 * commit tree.

What's a "commit tree"?  Shouldn't it be just a "commit"?


Sure

+	 */
+	if (!old_branch_info->commit || !new_branch_info->commit ||
+		oidcmp(&old_branch_info->commit->object.oid, &new_branch_info->commit->object.oid))
+		return 1;
+
+	/*
+	 * opts->patch_mode cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * opts->quiet only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * Honor the explicit request for a three-way merge or to throw away
+	 * local changes
+	 */
+	if (opts->merge || opts->force)
+		return 1;
+
+	/*
+	 * --detach is documented as "updating the index and the files in the
+	 * working tree" but this optimization skips those steps so fall through
+	 * to the regular code path.
+	 */
+	if (opts->force_detach)
+		return 1;
+
+	/*
+	 * opts->writeout_stage cannot be used with switching branches so is
+	 * not tested here
+	 */
+
+	/*
+	 * Honor the explicit ignore requests
+	 */
+	if (!opts->overwrite_ignore || opts->ignore_skipworktree ||
+		opts->ignore_other_worktrees)
+		return 1;
+
+	/*
+	 * opts->show_progress only impacts output so doesn't require a merge
+	 */
+
+	/*
+	 * If we aren't creating a new branch any changes or updates will
+	 * happen in the existing branch.  Since that could only be updating
+	 * the index and working directory, we don't want to skip those steps
+	 * or we've defeated any purpose in running the command.
+	 */
+	if (!opts->new_branch)
+		return 1;
+
+	/*
+	 * new_branch_force is defined to "create/reset and checkout a branch"
+	 * so needs to go through the merge to do the reset
+	 */
+	if (opts->new_branch_force)
+		return 1;
+
+	/*
+	 * A new orphaned branch requrires the index and the working tree to be
+	 * adjusted to <start_point>
+	 */
+	if (opts->new_orphan_branch)
+		return 1;
+
+	/*
+	 * Remaining variables are not checkout options but used to track state
+	 */
+
+	return 0;
+}

This helper function alone looks like we are creating a maintenance
nightmare from a quick scan.  How are we going to keep this up to
date?

I offhand do not know how "git checkout -b foo" would behave
differently if we do not do a two-way merge between HEAD and HEAD to
update the index.  We'd still need to list the local modifications
and say "Switched to a new branch 'foo'", but that would be a minor
thing compared to the two-way merge machinery.


With the patch, the index and working directory aren't modified, it does still list the local modifications and says "Switched to..."

The changes in behavior come from other inputs. For example, if you turned on sparse checkout and then did a "git checkout -b foo" the old code path would update the skip-worktree bit in the index and remove all the files from the working directory that were no longer specified in the sparse-checkout file. The new behavior would not do that as it no longer updates the index or working directory.

The checkout documentation doesn't _say_ that "git checkout -b foo" will also update the working directory after changing the sparse-checkout settings but it happens to do that. Is that a common scenario? No - but it is an undocumented behavior that will change if they opt in to this new behavior. At least the documentation for the new flag _does_ let the user know the working directory and index won't be changed so if they opt in, they shouldn't be too surprised.

Was the primary reason why the patch "changes the behaviour" because
nobody could prove that needs_working_tree_merge() helper reliably
detects that "checkout -b foo" case and that case alone, and show a
way to make sure it will keep doing so in the future when other new
features are added to the command?


My concern isn't ensuring that the patch reliably detects "checkout -b", the challenge is ensuring it will keep doing so as features are added that impact the "-b" option. The problem of adding a new option and having to ensure it behaves properly with all the other options isn't new but I agree this does add one more case that has to be handled.

@@ -479,6 +565,14 @@ static int merge_working_tree(const struct checkout_opts *opts,
  	int ret;
  	struct lock_file lock_file = LOCK_INIT;
+ /*
+	 * Skip merging the trees, updating the index, and work tree only if we
+	 * are simply creating a new branch via "git checkout -b foo."  Any
+	 * other options or usage will continue to do all these steps.
+	 */
+	if (core_optimize_checkout && !needs_working_tree_merge(opts, old_branch_info, new_branch_info))
+		return 0;
+
  	hold_locked_index(&lock_file, LOCK_DIE_ON_ERROR);
  	if (read_cache_preload(NULL) < 0)
  		return error(_("index file corrupt"));

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux