On Fri, Jun 28, 2024 at 5:43 AM Derrick Stolee via GitGitGadget <gitgitgadget@xxxxxxxxx> wrote: > > While doing some investigation in a private monorepo with sparse-checkout > and a sparse index, I accidentally left a modified file outside of my > sparse-checkout cone. This caused my Git commands to slow to a crawl, so I > reran with GIT_TRACE2_PERF=1. > > While I was able to identify clear_skip_worktree_from_present_files() as the > culprit, it took longer than desired to figure out what was going on. This > series intends to both fix the performance issue (as much as possible) and > do some refactoring to make it easier to understand what is happening. > > In the end, I was able to reduce the number of lstat() calls in my case from > over 1.1 million to about 4,400, improving the time from 13.4s to 81ms on a > warm disk cache. (These numbers are from a test after v2, which somehow hit > the old caching algorithm even worse than my test in v1.) > > > Updates in v3 > ============= > > * Removed the incorrect paragraph in the commit message of patch 1. > * Replaced "largest" with "longest" in the final patch. > > Thanks, Stolee > > Derrick Stolee (5): > sparse-checkout: refactor skip worktree retry logic > sparse-index: refactor path_found() > sparse-index: use strbuf in path_found() > sparse-index: count lstat() calls > sparse-index: improve lstat caching of sparse paths > > sparse-index.c | 216 +++++++++++++++++++++++++++++++++++++------------ > 1 file changed, 164 insertions(+), 52 deletions(-) > > > base-commit: 66ac6e4bcd111be3fa9c2a6b3fafea718d00678d > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1754%2Fderrickstolee%2Fclear-skip-speed-v3 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1754/derrickstolee/clear-skip-speed-v3 > Pull-Request: https://github.com/gitgitgadget/git/pull/1754 > > Range-diff vs v2: > > 1: 93d0baed0b0 ! 1: 0844cda94cf sparse-checkout: refactor skip worktree retry logic > @@ Commit message > stored in the index, so caching was introduced in d79d299352 (Accelerate > clear_skip_worktree_from_present_files() by caching, 2022-01-14). > > - If users are having trouble with the performance of this operation and > - don't care about paths outside of the sparse-checkout, they can disable > - them using the sparse.expectFilesOutsideOfPatterns config option > - introduced in ecc7c8841d (repo_read_index: add config to expect files > - outside sparse patterns, 2022-02-25). > - > This check is particularly confusing in the presence of a sparse index, > as a sparse tree entry corresponding to an existing directory must first > be expanded to a full index before examining the paths within. This is > 2: 69c3beaabf7 = 2: c242e2c9168 sparse-index: refactor path_found() > 3: 0a82e6b4183 = 3: ad63bf746ca sparse-index: use strbuf in path_found() > 4: 9549f5b8062 = 4: db6ded0df0d sparse-index: count lstat() calls > 5: 0cb344ac14f ! 5: 1f58e19691f sparse-index: improve lstat caching of sparse paths > @@ sparse-index.c: static void clear_path_found_data(struct path_found_data *data) > } > > +/** > -+ * Return the length of the largest common substring that ends in a > -+ * slash ('/') to indicate the largest common parent directory. Returns > ++ * Return the length of the longest common substring that ends in a > ++ * slash ('/') to indicate the longest common parent directory. Returns > + * zero if no common directory exists. > + */ > +static size_t max_common_dir_prefix(const char *path1, const char *path2) > > -- > gitgitgadget This version covers the last two outstanding items. Reviewed-by: Elijah Newren <newren@xxxxxxxxx>