Re: git status: small difference between stating whole repository and small subdirectory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 21, 2012 at 11:16:37AM -0800, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes:
> 
> > I'm aware that Jeff's tackling at lower level, which retains
> > cache-tree for many more cases.
> >
> > But this patch seems simple and safe
> > to me, and in my experience this case happens quite often (or maybe I
> > tend to keep my index clean). Junio, any chance this patch may get in?
> 
> I do not think we are talking about a duplicated effort here.
> 
> By definition, the change to hook into unpack_trees() and making sure we
> invalidate all the necessary subtrees in the cache cannot give you a cache
> tree that is more populated than what you started with.  And the train of
> thought in Peff's message is to improve this invalidation---we currently
> invalidate everything ;-)
> 
> Somebody has to populate the cache tree fully when we _know_ the index
> matches a certain tree, and adding a call to prime_cache_tree() in
> strategic places is a way to do so.  The most obvious is write-tree, but
> there are a few other existing codepaths that do so.
> 
> Because prime_cache_tree() by itself is a fairly expensive operation that
> reads all the trees recursively, its benefits need to be evaluated. It
> should to happen only in an operation that is already heavy-weight, is
> likely to have read all the trees and have many of them in-core cache, and
> also relatively rarely happens compared to "git add" so that the cost can
> be amortised over time, such as "reset --(hard|mixed)".

It's tradeoff. As you said prime_cache_tree() is expensive.
cache_tree_update is supposed to be cheap. But cache_tree_update() when
empty is even more expensive than prime_cache_tree(). So it boils down
how much cache-tree we have after unpack_trees() and whether it's worth
repopulate cache-tree again.

> Switching branches is likely to fall into that category, but that is just
> my gut feeling.  I would feel better at night if somebody did a benchmark
> ;-)

I timed prime_cache_tree() and cache_tree_update() while switching branch
on linux-2.6, between v2.6.32 and a quite recent version. prime_cache_tree()
took ~55ms while cache_tree_update() 150ms or 90ms (depending on final tree).
It depends on your view, whether 55ms is expensive on such a reasonably large
repository. I took several seconds for me to complete the checkout though.

Checking out with "-q" prime_cache_tree() took 145ms and 80ms respectively,
as expensive as cache_tree_update()

If cache-tree is only used at commit time, I think we could delay
prime_cache_tree() until absolutely needed. We could add an optional index
extension recording the fact that index matches certain tree.
On the first cache_tree_invalidate_path(), if cache-tree is still
empty, we prime cache-tree, then invalidate the requested path.
It would then add no cost to a quick branch switch.

But if diff-cached also takes advantage of cache-tree, it's a different story.

Anyway, I think this patch does better than my last one

-- 8< --
diff --git a/builtin/checkout.c b/builtin/checkout.c
index 6b9061f..e7eaeec 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -387,6 +387,7 @@ static int merge_working_tree(struct checkout_opts *opts,
 	int ret;
 	struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file));
 	int newfd = hold_locked_index(lock_file, 1);
+	int head_index_mismatch = 1;
 
 	if (read_cache_preload(NULL) < 0)
 		return error(_("corrupt index file"));
@@ -396,6 +397,7 @@ static int merge_working_tree(struct checkout_opts *opts,
 		ret = reset_tree(new->commit->tree, opts, 1);
 		if (ret)
 			return ret;
+		head_index_mismatch = 0;
 	} else {
 		struct tree_desc trees[2];
 		struct tree *tree;
@@ -490,7 +492,27 @@ static int merge_working_tree(struct checkout_opts *opts,
 			ret = reset_tree(new->commit->tree, opts, 0);
 			if (ret)
 				return ret;
-		}
+		} else
+			head_index_mismatch = topts.head_index_mismatch;
+	}
+
+	/*
+	 * Currently cache-tree is always destroyed after
+	 * unpack_trees(). It's probably a good idea to repopulate
+	 * cache-tree. If the user makes a few modifications and
+	 * commits, tree generation woulda be cheap. If they switch
+	 * away again, not so cheap.
+	 *
+	 * When unpack_trees() learns to retains as much cache-tree as
+	 * possible, this code probably does not help much on tree
+	 * generation, unless the tree difference between to heads are
+	 * too far, little cache-tree can be kept.
+	 */
+	if (!head_index_mismatch &&
+	    !cache_tree_fully_valid(active_cache_tree)) {
+		if (!new->commit->tree->object.parsed)
+			parse_tree(new->commit->tree);
+		prime_cache_tree(&active_cache_tree, new->commit->tree);
 	}
 
 	if (write_cache(newfd, active_cache, active_nr) ||
diff --git a/unpack-trees.c b/unpack-trees.c
index 7c9ecf6..f2c518f 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1022,6 +1022,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options
 	o->result.timestamp.nsec = o->src_index->timestamp.nsec;
 	o->merge_size = len;
 	mark_all_ce_unused(o->src_index);
+	if (o->fn != twoway_merge)
+		o->head_index_mismatch = 1;
 
 	/*
 	 * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries
@@ -1736,6 +1738,8 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o)
 		    (oldtree && newtree &&
 		     !same(oldtree, newtree) && /* 18 and 19 */
 		     same(current, newtree))) {
+			if (!newtree || (newtree && !same(current, newtree)))
+				o->head_index_mismatch = 1;
 			return keep_entry(current, o);
 		}
 		else if (oldtree && !newtree && same(current, oldtree)) {
diff --git a/unpack-trees.h b/unpack-trees.h
index 5e432f5..b75b64e 100644
--- a/unpack-trees.h
+++ b/unpack-trees.h
@@ -48,7 +48,8 @@ struct unpack_trees_options {
 		     gently,
 		     exiting_early,
 		     show_all_errors,
-		     dry_run;
+		     dry_run,
+		     head_index_mismatch;
 	const char *prefix;
 	int cache_bottom;
 	struct dir_struct *dir;
-- 8< --
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]