2010/11/15 Jonathan Nieder <jrnieder@xxxxxxxxx>: > Nguyán ThÃi Ngác Duy wrote: > >> Earlier, the will_have_skip_worktree() checks are done in various places: >> >> 1. in verify_* functions to prevent absent/uptodate checks outside >> Â Âworktree >> 2. in merged_entry for new index entries >> 3. in apply_sparse_checkout() for all entries >> >> In case all entries are added new ($GIT_DIR/index is missing) all the >> above checks will be done for all entries, which in the worst case can >> become cache_nr*el->nr*3 fnmatch() calls. Quite expensive. > > Does this mean something like: > > Â Â Â ÂHandling of sparse checkouts is slow. > > Â Â Â Â[timings] > > Â Â Â ÂTo fix this, we will do such-and-such. ÂAs a first step, > Â Â Â Âwe'll do such-and-such which does not change semantics > Â Â Â Âand at least does not make it any slower. > > ? > > In other words, any commit message should make clear > > Â1. The purpose. > Â2. The baseline of (sane or insane) behavior that is affected. > Â3. The intended resulting functionality. > > How the patch works and why it succeeds are just optional extras > (nice, certainly, but in 90% of cases it is obvious from the code once > you know (1), (2), and (3) anyway). Addressing the slowness is a nice side effect. My main concern is to collect all skip-worktree checks in a central place so that I can do tree-like traversing. >> --- a/cache.h >> +++ b/cache.h >> @@ -182,6 +182,7 @@ struct cache_entry { >> Â#define CE_WT_REMOVE (0x400000) /* remove in work directory */ >> >> Â#define CE_UNPACKED Â(0x1000000) >> +#define CE_NEW_SKIP_WORKTREE (0x2000000) > > Semantics? Yep, missed this in the commit message. While unpack_trees() is running, this bit will be set in entries that are outside worktree area, according to the (possibly updated) sparse-checkout file. Compare this bit with CE_SKIP_WORKTREE, we would know this entry should be removed/added due to sparse-checkout updates. >> --- a/unpack-trees.c >> +++ b/unpack-trees.c >> @@ -258,7 +258,7 @@ static int apply_sparse_checkout(struct cache_entry *ce, struct unpack_trees_opt >> Â{ >> Â Â Â int was_skip_worktree = ce_skip_worktree(ce); >> >> - Â Â if (!ce_stage(ce) && will_have_skip_worktree(ce, o)) >> + Â Â if (ce->ce_flags & CE_NEW_SKIP_WORKTREE) >> Â Â Â Â Â Â Â ce->ce_flags |= CE_SKIP_WORKTREE; > > So CE_NEW_SKIP_WORKTREE roughly means "stage-0 entry outside the sparse checkout area"? Yes. Conflicted entries must always stay in worktree regardless sparse-checkout patterns. > >> Â Â Â else >> Â Â Â Â Â Â Â ce->ce_flags &= ~CE_SKIP_WORKTREE; > > In particular, I guess the NEW part refers to the sparse checkout > area, not the entry, since existing index entries with SKIP_WORKTREE > bits need to keep that bit. The NEW part still refers to entry. It's basically ce_flags[CE_SKIP_WORKTREE] = ce_flags[CE_NEW_SKIP_WORKTREE]. The former bit is kept in file, the later is in memory only. >> + >> +static int verify_absent(struct cache_entry *, enum unpack_trees_error_types, struct unpack_trees_options *); >> +static int set_new_skip_worktree_2(struct unpack_trees_options *o) >> +{ >> + Â Â int i; >> + >> + Â Â /* >> + Â Â Â* CE_ADDED marks new index entries. These have not been processed >> + Â Â Â* by set_new_skip_worktree_1() so we do it here. >> + Â Â Â*/ > > Probably this comment belongs at the call site instead, to avoid some > chasing. There is a comment in the callsite, though it does not directly say "set_new_skip_worktree_2". Will fix. >> + Â Â for (i = 0;i < o->result.cache_nr;i++) { >> + Â Â Â Â Â Â struct cache_entry *ce = o->result.cache[i]; >> + >> + Â Â Â Â Â Â if (!(ce->ce_flags & CE_ADDED)) >> + Â Â Â Â Â Â Â Â Â Â continue; >> + >> + Â Â Â Â Â Â ce->ce_flags &= ~CE_ADDED; >> + Â Â Â Â Â Â if (!ce_stage(ce) && will_have_skip_worktree(ce, o)) >> + Â Â Â Â Â Â Â Â Â Â ce->ce_flags |= CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE; >> + Â Â Â Â Â Â else >> + Â Â Â Â Â Â Â Â Â Â ce->ce_flags &= ~(CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE); > > CE_ADDED is private to the add_file_to_index() code path shared > between add and rerere builtins. Ârerere is libified and used e.g. by > cherry-pick foo..bar. ÂCan it get us in trouble or do we have clear > the flags before using them here? ÂI think it would be worth a note in > api-in-core-index.txt to warn future refactorers. If it's too much trouble, I can use a new bit. Though the number of available bits is decreasing. >> + >> + Â Â Â Â Â Â /* Left-over checks from merged_entry when old == NULL */ > > Huh? Â(That is completely cryptic outside the context of the patch.) Will elaborate. >> @@ -862,6 +905,9 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options >> Â Â Â Â Â Â Â Â Â Â Â o->el = ⪙ >> Â Â Â } >> >> + Â Â if (!o->skip_sparse_checkout) >> + Â Â Â Â Â Â set_new_skip_worktree_1(o); >> + > > Why is this not folded into the above if ()? > > This populates the NEW_SKIP_WORKTREE (= future SKIP_WORKTREE?) bit > in the index that represents the preimage for a reset or merge. > > Perhaps call it: > > Â Â Â Â Â Â Â Âset_new_skip_worktree(o->src_index, 0); > > and mark that function __attribute__((always_inline)) if the > optimizer doesn't want to cooperate? It's setting future skip-worktree on existing entries index, contrast with the _2() function, which is setting future skip-worktree on _new_ entries in the index. How do you name those functions? Wait.. >> Â Â Â memset(&o->result, 0, sizeof(o->result)); >> Â Â Â o->result.initialized = 1; >> Â Â Â o->result.timestamp.sec = o->src_index->timestamp.sec; >> @@ -922,6 +968,10 @@ int unpack_trees(unsigned len, struct tree_desc *t, struct unpack_trees_options >> >> Â Â Â if (!o->skip_sparse_checkout) { >> Â Â Â Â Â Â Â int empty_worktree = 1; >> + >> + Â Â Â Â Â Â if (set_new_skip_worktree_2(o)) >> + Â Â Â Â Â Â Â Â Â Â goto return_failed; >> + > > Trivial part of the merge is over. ÂSo now we can check the new > index entries against the sparse worktree patterns. ÂThey were added in > that trivial part using add_entry() and will have the CE_ADDED flag. > > So this might be: > > Â Â Â Â Â Â Â Âset_new_skip_worktree(&o->result, CE_ADDED); > > or > > Â Â Â Â Â Â Â Âset_result_skip_worktree(o); > OK I like the former over the latter. > The CE_ADDED flag was set in merged_entry, called by the merge_fn_t > implementations. ÂIf there were an api-traverse-trees.txt explaining > the API, it would be worth a note there; for now it should suffice > to add a note to future merge_fn_t implementors in the commit > message ("each unpack-trees merge function arranges for > CE_SKIP_WORKTREE and CE_SKIP_NEW_WORKTREE to be propagated and for the > CE_ADDED flag to be set on entries of new paths"). OK >> @@ -1017,7 +1067,7 @@ static int verify_uptodate_1(struct cache_entry *ce, >> Âstatic int verify_uptodate(struct cache_entry *ce, >> Â Â Â Â Â Â Â Â Â Â Â Â Âstruct unpack_trees_options *o) >> Â{ >> - Â Â if (!o->skip_sparse_checkout && will_have_skip_worktree(ce, o)) >> + Â Â if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE)) > [...] >> @@ -1204,7 +1254,7 @@ static int verify_absent(struct cache_entry *ce, >> Â Â Â Â Â Â Â Â Â Â Â Âenum unpack_trees_error_types error_type, >> Â Â Â Â Â Â Â Â Â Â Â Âstruct unpack_trees_options *o) >> Â{ >> - Â Â if (!o->skip_sparse_checkout && will_have_skip_worktree(ce, o)) >> + Â Â if (!o->skip_sparse_checkout && (ce->ce_flags & CE_NEW_SKIP_WORKTREE)) > > The point, I hope. ÂCurrently we alternate between finding entries to > remove and deciding whether if lie in the worktree. ÂAfter this patch, > whether they lie in the worktree is precomputed. Yep. >> @@ -1226,10 +1276,15 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old, >> Â Â Â int update = CE_UPDATE; >> >> Â Â Â if (!old) { >> + Â Â Â Â Â Â /* >> + Â Â Â Â Â Â Â* Set CE_NEW_SKIP_WORKTREE on "merge" to make >> + Â Â Â Â Â Â Â* verify_absent() no-op. The true check will be done >> + Â Â Â Â Â Â Â* later on in unpack_trees(). >> + Â Â Â Â Â Â Â*/ >> + Â Â Â Â Â Â merge->ce_flags |= CE_NEW_SKIP_WORKTREE; > > Mm? ÂSince verify_absent() is a no-op, why call it? ÂThis looks like a > placeholder for code that calls verify_absent later, when we know if > it lies in the worktree. It is no-op only when sparse checkout is enabled. In such cases we don't know (yet) whether we will add those files in worktree. In normal cases, everything must be added in worktree, so verify_absent() is real. >> @@ -1245,8 +1300,8 @@ static int merged_entry(struct cache_entry *merge, struct cache_entry *old, >> Â Â Â Â Â Â Â } else { >> Â Â Â Â Â Â Â Â Â Â Â if (verify_uptodate(old, o)) >> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return -1; >> - Â Â Â Â Â Â Â Â Â Â if (ce_skip_worktree(old)) >> - Â Â Â Â Â Â Â Â Â Â Â Â Â Â update |= CE_SKIP_WORKTREE; >> + Â Â Â Â Â Â Â Â Â Â /* Migrate old bits over */ >> + Â Â Â Â Â Â Â Â Â Â update |= old->ce_flags & (CE_SKIP_WORKTREE | CE_NEW_SKIP_WORKTREE); > > Thanks, this looks promising. Thanks for looking at it. These stuff can be error-prone. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html