On Mon, Oct 31, 2022 at 12:56:31AM +0000, Anh Le via GitGitGadget wrote: > From: Anh Le <anh@xxxxxxxxx> > > In a large repository using sparse checkout, checking whether a file marked > with skip worktree is present on disk and its skip worktree bit should be > cleared can take a considerable amount of time. Add a trace2 region to > surface this information, keeping a count of how many paths have been > checked and separately keep counts for after a full index is materialised. I think the text you wrote here is fine, but it did take me a couple of passes to make sure I understood it correctly. Perhaps the following might be clearer: When using sparse checkout, clear_skip_worktree_from_present_files() must enumerate index entries to find ones with the SKIP_WORKTREE bit to determine whether those index entries exist on disk (in which case their SKIP_WORKTREE bit should be removed). In a large repository, this may take considerable time depending on the size of the index. Add a trace2 region to surface this information, keeping a count o fhow many paths have been checked. Separately, keep counts after a full index is materialized. Is that accurate? > diff --git a/sparse-index.c b/sparse-index.c > index e4a54ce1943..8301129bf8f 100644 > --- a/sparse-index.c > +++ b/sparse-index.c > @@ -493,24 +493,42 @@ void clear_skip_worktree_from_present_files(struct index_state *istate) > int dir_found = 1; > > int i; > + int path_count[2] = {0, 0}; > + int restarted = 0; Looks good. > if (!core_apply_sparse_checkout || > sparse_expect_files_outside_of_patterns) > return; > > + trace2_region_enter("index", "clear_skip_worktree_from_present_files", > + istate->repo); Here and below, there is some funky indentation. When I apply the patch locally, I find that I need to squash this in: --- >8 --- diff --git a/sparse-index.c b/sparse-index.c index 8301129bf8..23f997e103 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -501,7 +501,7 @@ void clear_skip_worktree_from_present_files(struct index_state *istate) return; trace2_region_enter("index", "clear_skip_worktree_from_present_files", - istate->repo); + istate->repo); restart: for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; --- 8< --- > restart: > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce = istate->cache[i]; > > - if (ce_skip_worktree(ce) && > - path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { > - if (S_ISSPARSEDIR(ce->ce_mode)) { > - ensure_full_index(istate); > - goto restart; > + if (ce_skip_worktree(ce)) { > + path_count[restarted]++; > + if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { > + if (S_ISSPARSEDIR(ce->ce_mode)) { > + if (restarted) > + BUG("ensure-full-index did not fully flatten?"); > + ensure_full_index(istate); > + restarted = 1; > + goto restart; > + } > + ce->ce_flags &= ~CE_SKIP_WORKTREE; > } > - ce->ce_flags &= ~CE_SKIP_WORKTREE; Looks fine, though the "if (restarted) BUG(...)" is new. It's not wrong, per-se, but seeing it in a separate commit either before or after the main change would be appreciated. > } > } > + > + if (path_count[0]) > + trace2_data_intmax("index", istate->repo, > + "sparse_path_count", path_count[0]); > + if (restarted) > + trace2_data_intmax("index", istate->repo, > + "sparse_path_count_full", path_count[1]); > + trace2_region_leave("index", "clear_skip_worktree_from_present_files", > + istate->repo); Same note about indentation here. Thanks, Taylor