On Tue, Feb 21, 2012 at 11:16:37AM -0800, Junio C Hamano wrote: > Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > > > I'm aware that Jeff's tackling at lower level, which retains > > cache-tree for many more cases. > > > > But this patch seems simple and safe > > to me, and in my experience this case happens quite often (or maybe I > > tend to keep my index clean). Junio, any chance this patch may get in? > > I do not think we are talking about a duplicated effort here. > > By definition, the change to hook into unpack_trees() and making sure we > invalidate all the necessary subtrees in the cache cannot give you a cache > tree that is more populated than what you started with. And the train of > thought in Peff's message is to improve this invalidation---we currently > invalidate everything ;-) > > Somebody has to populate the cache tree fully when we _know_ the index > matches a certain tree, and adding a call to prime_cache_tree() in > strategic places is a way to do so. The most obvious is write-tree, but > there are a few other existing codepaths that do so. > > Because prime_cache_tree() by itself is a fairly expensive operation that > reads all the trees recursively, its benefits need to be evaluated. It > should to happen only in an operation that is already heavy-weight, is > likely to have read all the trees and have many of them in-core cache, and > also relatively rarely happens compared to "git add" so that the cost can > be amortised over time, such as "reset --(hard|mixed)". It's tradeoff. As you said prime_cache_tree() is expensive. cache_tree_update is supposed to be cheap. But cache_tree_update() when empty is even more expensive than prime_cache_tree(). So it boils down how much cache-tree we have after unpack_trees() and whether it's worth repopulate cache-tree again. > Switching branches is likely to fall into that category, but that is just > my gut feeling. I would feel better at night if somebody did a benchmark > ;-) I timed prime_cache_tree() and cache_tree_update() while switching branch on linux-2.6, between v2.6.32 and a quite recent version. prime_cache_tree() took ~55ms while cache_tree_update() 150ms or 90ms (depending on final tree). It depends on your view, whether 55ms is expensive on such a reasonably large repository. I took several seconds for me to complete the checkout though. Checking out with "-q" prime_cache_tree() took 145ms and 80ms respectively, as expensive as cache_tree_update() If cache-tree is only used at commit time, I think we could delay prime_cache_tree() until absolutely needed. We could add an optional index extension recording the fact that index matches certain tree. On the first cache_tree_invalidate_path(), if cache-tree is still empty, we prime cache-tree, then invalidate the requested path. It would then add no cost to a quick branch switch. But if diff-cached also takes advantage of cache-tree, it's a different story. Anyway, I think this patch does better than my last one -- 8< -- diff --git a/builtin/checkout.c b/builtin/checkout.c index 6b9061f..e7eaeec 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -387,6 +387,7 @@ static int merge_working_tree(struct checkout_opts *opts, int ret; struct lock_file *lock_file = xcalloc(1, sizeof(struct lock_file)); int newfd = hold_locked_index(lock_file, 1); + int head_index_mismatch = 1; if (read_cache_preload(NULL) < 0) return error(_("corrupt index file")); @@ -396,6 +397,7 @@ static int merge_working_tree(struct checkout_opts *opts, ret = reset_tree(new->commit->tree, opts, 1); if (ret) return ret; + head_index_mismatch = 0; } else { struct tree_desc trees[2]; struct tree *tree; @@ -490,7 +492,27 @@ static int merge_working_tree(struct checkout_opts *opts, ret = reset_tree(new->commit->tree, opts, 0); if (ret) return ret; - } + } else + head_index_mismatch = topts.head_index_mismatch; + } + + /* + * Currently cache-tree is always destroyed after + * unpack_trees(). It's probably a good idea to repopulate + * cache-tree. If the user makes a few modifications and + * commits, tree generation woulda be cheap. If they switch + * away again, not so cheap. + * + * When unpack_trees() learns to retains as much cache-tree as + * possible, this code probably does not help much on tree + * generation, unless the tree difference between to heads are + * too far, little cache-tree can be kept. + */ + if (!head_index_mismatch && + !cache_tree_fully_valid(active_cache_tree)) { + if (!new->commit->tree->object.parsed) + parse_tree(new->commit->tree); + prime_cache_tree(&active_cache_tree, new->commit->tree); } if (write_cache(newfd, active_cache, active_nr) || diff --git a/unpack-trees.c b/unpack-trees.c index 7c9ecf6..f2c518f 100644 --- a/unpack-trees.c +++ b/unpack-trees.c @@ -1022,6 +1022,8 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options o->result.timestamp.nsec = o->src_index->timestamp.nsec; o->merge_size = len; mark_all_ce_unused(o->src_index); + if (o->fn != twoway_merge) + o->head_index_mismatch = 1; /* * Sparse checkout loop #1: set NEW_SKIP_WORKTREE on existing entries @@ -1736,6 +1738,8 @@ int twoway_merge(struct cache_entry **src, struct unpack_trees_options *o) (oldtree && newtree && !same(oldtree, newtree) && /* 18 and 19 */ same(current, newtree))) { + if (!newtree || (newtree && !same(current, newtree))) + o->head_index_mismatch = 1; return keep_entry(current, o); } else if (oldtree && !newtree && same(current, oldtree)) { diff --git a/unpack-trees.h b/unpack-trees.h index 5e432f5..b75b64e 100644 --- a/unpack-trees.h +++ b/unpack-trees.h @@ -48,7 +48,8 @@ struct unpack_trees_options { gently, exiting_early, show_all_errors, - dry_run; + dry_run, + head_index_mismatch; const char *prefix; int cache_bottom; struct dir_struct *dir; -- 8< -- -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html