On Thu, Jun 27, 2024 at 1:59 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: > > "Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > > > -void clear_skip_worktree_from_present_files(struct index_state *istate) > > +static int clear_skip_worktree_from_present_files_sparse(struct index_state *istate) > > { > > const char *last_dirname = NULL; > > size_t dir_len = 0; > > int dir_found = 1; > > > > - int i; > > - int path_count[2] = {0, 0}; > > - int restarted = 0; > > + int path_count = 0; > > + int to_restart = 0; > > > > - if (!core_apply_sparse_checkout || > > - sparse_expect_files_outside_of_patterns) > > - return; > > - > > - trace2_region_enter("index", "clear_skip_worktree_from_present_files", > > + trace2_region_enter("index", "clear_skip_worktree_from_present_files_sparse", > > istate->repo); > > -restart: > > - for (i = 0; i < istate->cache_nr; i++) { > > + for (int i = 0; i < istate->cache_nr; i++) { > > struct cache_entry *ce = istate->cache[i]; > > > > if (ce_skip_worktree(ce)) { > > - path_count[restarted]++; > > + path_count++; > > if (path_found(ce->name, &last_dirname, &dir_len, &dir_found)) { > > if (S_ISSPARSEDIR(ce->ce_mode)) { > > - if (restarted) > > - BUG("ensure-full-index did not fully flatten?"); > > - ensure_full_index(istate); > > - restarted = 1; > > - goto restart; > > + to_restart = 1; > > + break; > > } > > ce->ce_flags &= ~CE_SKIP_WORKTREE; > > } > > } > > } > > Both original and the rewritten code shares one trait, which is that > it goes from the beginning to check all paths that are marked with > SKIP_WORKTREE bit to clear CE_SKIP_WORKTREE bits from them, until > they find a S_ISSPARSEDIR entry and lazily call ensure_full_index(), > but then when retrying after calling ensure_full_index(), it > restarts from the beginning. I wonder if it would help, especially > if the S_ISSPARSEDIR entry comes very late in the index (e.g., by "the S_ISSPARSEDIR entry"? Are you assuming there is only one? > returning "here is where we have already checked during the first > run, until we realized that we first need to do ensure_full_index()" > to the caller from here, and then the caller tells the second phase > to restart from there), to reduce the number of calls to path_found()? My first assumption about how to "restart from there", was to pass the current index, i, to the more involved check. That might fail to be the correct index when there are multiple S_ISSPARSEDIR entries and the first one (or more) are actually missing from the working copy as expected. However, if "restart from there" is just passing the name of the relevant directory and we do some kind of binary search to find where the earliest entry under that directory would exist in the expanded index, then that seems like a reasonable additional optimization.