Re: [PATCH 0/5] sparse-index: improve clear_skip_worktree_from_present_files()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/20/24 3:16 PM, Junio C Hamano wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:

While doing some investigation in a private monorepo with sparse-checkout
and a sparse index, I accidentally left a modified file outside of my
sparse-checkout cone. This caused my Git commands to slow to a crawl, so I
reran with GIT_TRACE2_PERF=1.

While I was able to identify clear_skip_worktree_from_present_files() as the
culprit, it took longer than desired to figure out what was going on. This
series intends to both fix the performance issue (as much as possible) and
do some refactoring to make it easier to understand what is happening.

In the end, I was able to reduce the number of lstat() calls in my case from
over 170,000 to about 6,500, improving the time from 2.5s to 71ms on a warm
disk cache.   Thanks, Stolee

That's impressive but I cannot offhand tell how big 170k (or 6.5k
for that matter) is relative to the size of the tree.  How many
paths are there in the entire tree (i.e. "git ls-tree -r HEAD | wc
-l") vs the number of the in-cone paths in the working tree?

If 6.5k is in the same ballpark as the latter, it would be really
good.

You're right, I didn't include the full context here. The repo has
about 2.1 million paths at HEAD, but most of them are sparse.

In Patch 5, I detail that there are 1,841,997 total sparse files in
the expanded index. Thus, the previous caching algorithm was already
doing decent work and calling lstat() 11x fewer times than the naive
implementation.

The new caching algorithm improves this to 6,521, which is a 282x
improvement over naive and and 26x improvement over the previous
caching algorithm.

But what you are really asking is how close this is to the optimal.
I didn't include that in Patch 5 details, but I was able to look at
my notes and see that the sparse_path_count data point was 1,962,
meaning there are that many sparse trees in the sparse index before
expanding. Thus, the 6,521 lstat() calls are 3.3x more than the
absolute minimum required.

Does that help answer the questions you had? I'm happy to provide
more information.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux