On 4/20/2021 8:44 PM, Elijah Newren wrote: > On Tue, Apr 13, 2021 at 7:01 AM Derrick Stolee via GitGitGadget > <gitgitgadget@xxxxxxxxx> wrote: >> >> From: Derrick Stolee <dstolee@xxxxxxxxxxxxx> >> >> By testing 'git -c core.fsmonitor= status -uno', we can check for the >> simplest index operations that can be made sparse-aware. The necessary >> implementation details are already integrated with sparse-checkout, so >> modify command_requires_full_index to be zero for cmd_status(). >> >> By running the debugger for 'git status -uno' after that change, we find >> two instances of ensure_full_index() that were added for extra safety, >> but can be removed without issue. >> >> In refresh_index(), we loop through the index entries. The >> refresh_cache_ent() method copies the sparse directories into the >> refreshed index without issue. > > I do see the removal of a call to ensure_full_index() in > refresh_index() that you mention in this paragraph in the patch below. > > I'm confused, though; I would have thought we wanted to avoid a > refresh_cache_ent() call. Also, one of your previous patches added a > > if (istate->sparse_index && S_ISSPARSEDIR(ce->ce_mode)) > continue; > > check before the code ever gets to the refresh_cache_ent() call, so as > far as I can tell, that function won't be called from refresh_entry() > for sparse entries. Maybe your commit message here is out-of-date? > Or am I confused somehow? > >> The loop within run_diff_files() skips things that are in stage 0 and >> have skip-worktree enabled, so seems safe to disable ensure_full_index() >> here. > > Unlike the above, I don't see a removal of a ensure_full_index() call > in run_diff_files() as claimed by this paragraph. Has the commit > message gotten out of date with refactorings you did while developing > this series? I greatly reduced the number of ensure_full_index() calls in the previous topic (ds/sparse-index-protections) since first writing this patch, so it is very likely to be out-of-date. Thanks for calling it out. >> This allows some cases of 'git status' to no longer expand a sparse >> index to a full one, giving the following performance improvements for >> p2000-sparse-checkout-operations.sh: >> >> Test HEAD~1 HEAD >> ----------------------------------------------------------------------------- >> 2000.2: git status (full-index-v3) 0.38(0.36+0.07) 0.37(0.31+0.10) -2.6% >> 2000.3: git status (full-index-v4) 0.38(0.29+0.12) 0.37(0.30+0.11) -2.6% >> 2000.4: git status (sparse-index-v3) 2.43(2.33+0.14) 0.04(0.05+0.04) -98.4% >> 2000.5: git status (sparse-index-v4) 2.44(2.35+0.13) 0.05(0.04+0.07) -98.0% >> >> Note that since HEAD~1 was expanding the sparse index by parsing trees, >> it was artificially slower than the full index case. Thus, the 98% >> improvement is misleading, and instead we should celebrate the 0.37s to >> 0.05s improvement of 82%. This is more indicative of the peformance >> gains we are expecting by using a sparse index. > > 82%, very nice. Was this with git.git as the test repository, or some > other repo? If it's git.git, then we'd actually expect a much bigger > speedup for other repositories, as git.git is pretty small. This test script takes the input repository (git.git in this case) and creates a tree that contains that repository many times over, but only four copies remain in the sparse-checkout definition. This creates the big speedup, because of the enormous difference in index size. As I am exploring commands such as 'merge' and 'rebase' I am finding that this test setup is too expensive to cover those commands. I will need to reduce the size of the test repository (by a factor of 4) and that will reduce how impressive these results are while making the more complicated commands testable in a reasonable amount of time. Thanks, -Stolee