[PATCH v3 0/4] read-cache: speed up index load through parallelization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On further investigation with the previous patch, I noticed that my test
repos didn't contain the cache tree extension in their index. After doing a
commit to ensure they existed, I realized that in some instances, the time
to load the cache tree exceeded the time to load all the cache entries in
parallel.  Because the thread to read the cache tree was started last (due
to having to parse through all the cache entries first) we weren't always
getting optimal performance.

To better optimize for this case, I decided to write the EOIE extension
as suggested by Junio [1] in response to my earlier multithreading patch
series [2].  This enables me to spin up the thread to load the extensions
earlier as it no longer has to parse through all the cache entries first.

The big changes in this iteration are:

- add the EOIE extension
- update the index extension worker thread to start first

The absolute perf numbers don't look as good as the previous iteration
because not loading the cache tree at all is a lot faster than loading it in
parallel. These were measured with a V4 index that included a cache tree
extension.

I used p0002-read-cache.sh to generate some performance data on how the three
performance patches help:

p0002-read-cache.sh w/100,000 files                        
Baseline         expand_name_field()    Thread extensions       Thread entries
---------------------------------------------------------------------------------------
22.34(0.01+0.12) 21.14(0.03+0.01) -5.4% 20.71(0.03+0.03) -7.3%	13.93(0.04+0.04) -37.6%

p0002-read-cache.sh w/1,000,000 files                        
Baseline          expand_name_field()     Thread extensions        Thread entries
-------------------------------------------------------------------------------------------
306.44(0.04+0.07) 295.42(0.01+0.07) -3.6% 217.60(0.03+0.04) -29.0% 199.00(0.00+0.10) -35.1%

This patch conflicts with Duy's patch to remove the double memory copy and
pass in the previous ce instead.  The two will need to be merged/reconciled
once they settle down a bit.

[1] https://public-inbox.org/git/xmqq1sl017dw.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxx/
[2] https://public-inbox.org/git/20171109141737.47976-1-benpeart@xxxxxxxxxxxxx/


Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/325ec69299
Checkout: git fetch https://github.com/benpeart/git read-index-multithread-v3 && git checkout 325ec69299


### Patches

Ben Peart (4):
  read-cache: optimize expand_name_field() to speed up V4 index parsing.
  eoie: add End of Index Entry (EOIE) extension
  read-cache: load cache extensions on a worker thread
  read-cache: speed up index load through parallelization

 Documentation/config.txt                 |   6 +
 Documentation/technical/index-format.txt |  23 ++
 config.c                                 |  18 +
 config.h                                 |   1 +
 read-cache.c                             | 476 ++++++++++++++++++++---
 t/README                                 |  11 +
 t/t1700-split-index.sh                   |   1 +
 7 files changed, 487 insertions(+), 49 deletions(-)


base-commit: 29d9e3e2c47dd4b5053b0a98c891878d398463e3
-- 
2.18.0.windows.1






[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux