On Tue, Jan 11, 2022 at 10:30 AM Victoria Dye <vdye@xxxxxxxxxx> wrote: > > Elijah Newren wrote: > > Rather than lstat()'ing every SKIP_WORKTREE path, take advantage of the > > fact that entire directories will often be missing, especially for cone > > mode and even more so ever since commit 55dfcf9591 ("sparse-checkout: > > clear tracked sparse dirs", 2021-09-08). If we have already determined > > that the parent directory of a file (or any other previous ancestor) > > does not exist, then we already know the file cannot exist and do not > > need to lstat() it separately. > > > > Granted, the cost of ensure_skip_worktree_means_skip_worktree() might > > be considered a bit high for non-cone mode since it might now lstat() > > every SKIP_WORKTREE path when the index is loaded (an O(N) cost, with > > N the number of SKIP_WORKTREE paths), but non-cone mode users already > > have to deal with the O(N*M) cost (with N=the number of tracked files > > and M=the number of sparsity patterns), so this should be reasonable. > > > > Did you write/run any performance tests to see how this optimization changed > the execution time? If not, running the `p2000` performance tests against > the patch series base, [3/5], and [5/5] would provide some really helpful > insight into the cost of `ensure_skip_worktree_means_skip_worktree`, then > how much this optimization improves it. I haven't[1]. You bring up a very good point; I'll add it for the next round. [1] Long, probably irrelevant story about why: My original patches were actually going to go further and just remove the present-despite-SKIP_WORKTREE files in _all_ cases, sparse-checkout or not. It had not occurred to me while writing the patches to make it specific to sparse-checkouts. Because of that, I figured it was better to get feedback on if the idea was acceptable and spent a lot more time concentrating on making the case. Then I realized near the end that folks who don't use sparse-checkout or SKIP_WORKTREE might be annoyed at the overhead also being added for them, for a feature they'll never even use. I decided to back off a bit, and make it sparse-checkout specific. Then I realized that backing off might just keep all users happy anyway (the folks who intentionally use present-despite-SKIP_WORKTREE paths, despite their many warts, can keep doings so) and edited a lot of my commit messages, docs, and cover letter. And by then it was late Saturday night and I had promised to send out my series on Friday. Since I had already marked my cover letter as RFC anyway, I just decided to temporarily punt on getting performance numbers...