Re: [PATCH v6 0/7] speed up index load through parallelization

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 26 Sep 2018 15:06:57 -0700

Ben Peart <peartben@xxxxxxxxx> writes:

> Base Ref: master
> Web-Diff: https://github.com/benpeart/git/commit/a0300882d4
> Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v6 && git checkout a0300882d4
>
>
> This iteration brings back the Index Entry Offset Table (IEOT) extension
> which enables us to multi-thread the cache entry parsing without having
> the primary thread have to scan all the entries first.  In cases where the
> cache entry parsing is the most expensive part, this yields some additional
> savings.

Nice.

> Test w/100,000 files    Baseline  Optimize V4    Extensions     Entries
> ----------------------------------------------------------------------------
> 0002.1: read_cache      22.36     18.74 -16.2%   18.64 -16.6%   12.63 -43.5%
>
> Test w/1,000,000 files  Baseline  Optimize V4    Extensions     Entries
> -----------------------------------------------------------------------------
> 0002.1: read_cache      304.40    270.70 -11.1%  195.50 -35.8%  204.82 -32.7%
>
> Note that on the 1,000,000 files case, multi-threading the cache entry parsing
> does not yield a performance win.  This is because the cost to parse the
> index extensions in this repo, far outweigh the cost of loading the cache
> entries.
> ...
> The high cost of parsing the index extensions is driven by the cache tree
> and the untracked cache extensions. As this is currently the longest pole,
> any reduction in this time will reduce the overall index load times so is
> worth further investigation in another patch series.

Interesting.

> One option would be to load each extension on a separate thread but I
> believe that is overkill for the vast majority of repos.  Instead, some
> optimization of the loading code for these two extensions is probably worth
> looking into as a quick examination shows that the bulk of the time for both
> of them is spent in xcalloc().

Thanks.  Looking forward to block some quality time off to read this
through, but from the cursory look (read: diff between the previous
round), this looks quite promising.