On Fri, Oct 06, 2023 at 06:09:25PM +0000, Victoria Dye via GitGitGadget wrote: > While investigating ref iteration performance in builtins like > 'for-each-ref' and 'show-ref', I found two small improvement opportunities. > > The first patch tweaks the logic around prefix matching in > 'cache_ref_iterator_advance' so that we correctly skip refs that do not > actually match a given prefix. The unnecessary iteration doesn't seem to be > causing any bugs in the ref iteration commands that I've tested, but it > doesn't hurt to be more precise (and it helps with some other patches I'm > working on ;) ). > > The next three patches update how 'loose_fill_ref_dir' determines the type > of ref cache entry to create (directory or regular). On platforms that > include d_type information in 'struct dirent' (as far as I can tell, all > except NonStop & certain versions of Cygwin), this allows us to skip calling > 'stat'. In ad-hoc testing, this improved performance of 'git for-each-ref' > by about 20%. I've done a small set of benchmarks with my usual test repositories, which is linux.git with a bunch of references added. The repository comes in four sizes: - small: 50k references - medium: 500k references - high: 1.1m references - huge: 12m references Unfortunately, I couldn't really reproduce the performance improvements. In fact, the new version runs consistently a tiny bit slower than the old version: # Old version, which is 3a06386e31 (The fifteenth batch, 2023-10-04). Benchmark 1: git for-each-ref (revision=old,refcount=small) Time (mean ± σ): 135.5 ms ± 1.2 ms [User: 76.4 ms, System: 59.0 ms] Range (min … max): 134.8 ms … 136.9 ms 3 runs Benchmark 2: git for-each-ref (revision=old,refcount=medium) Time (mean ± σ): 822.7 ms ± 2.2 ms [User: 697.4 ms, System: 125.1 ms] Range (min … max): 821.1 ms … 825.2 ms 3 runs Benchmark 3: git for-each-ref (revision=old,refcount=high) Time (mean ± σ): 1.960 s ± 0.015 s [User: 1.702 s, System: 0.257 s] Range (min … max): 1.944 s … 1.973 s 3 runs # New version, which is your tip. Benchmark 4: git for-each-ref (revision=old,refcount=huge) Time (mean ± σ): 16.815 s ± 0.054 s [User: 15.091 s, System: 1.722 s] Range (min … max): 16.760 s … 16.869 s 3 runs Benchmark 5: git for-each-ref (revision=new,refcount=small) Time (mean ± σ): 136.0 ms ± 0.2 ms [User: 78.8 ms, System: 57.1 ms] Range (min … max): 135.8 ms … 136.2 ms 3 runs Benchmark 6: git for-each-ref (revision=new,refcount=medium) Time (mean ± σ): 830.4 ms ± 21.2 ms [User: 691.3 ms, System: 138.7 ms] Range (min … max): 814.2 ms … 854.5 ms 3 runs Benchmark 7: git for-each-ref (revision=new,refcount=high) Time (mean ± σ): 1.966 s ± 0.013 s [User: 1.717 s, System: 0.249 s] Range (min … max): 1.952 s … 1.978 s 3 runs Benchmark 8: git for-each-ref (revision=new,refcount=huge) Time (mean ± σ): 16.945 s ± 0.037 s [User: 15.182 s, System: 1.760 s] Range (min … max): 16.910 s … 16.983 s 3 runs Summary git for-each-ref (revision=old,refcount=small) ran 1.00 ± 0.01 times faster than git for-each-ref (revision=new,refcount=small) 6.07 ± 0.06 times faster than git for-each-ref (revision=old,refcount=medium) 6.13 ± 0.17 times faster than git for-each-ref (revision=new,refcount=medium) 14.46 ± 0.17 times faster than git for-each-ref (revision=old,refcount=high) 14.51 ± 0.16 times faster than git for-each-ref (revision=new,refcount=high) 124.09 ± 1.15 times faster than git for-each-ref (revision=old,refcount=huge) 125.05 ± 1.12 times faster than git for-each-ref (revision=new,refcount=huge) The performance regression isn't all that concerning, but it makes me wonder why I see things becoming slower rather than faster. My guess is that this is because all my test repositories are well-packed and don't have a lot of loose references. But I just wanted to confirm how you benchmarked your change and what the underlying shape of your test repo was. Patrick > Thanks! > > * Victoria > > Victoria Dye (4): > ref-cache.c: fix prefix matching in ref iteration > dir.[ch]: expose 'get_dtype' > dir.[ch]: add 'follow_symlink' arg to 'get_dtype' > files-backend.c: avoid stat in 'loose_fill_ref_dir' > > diagnose.c | 42 +++--------------------------------------- > dir.c | 33 +++++++++++++++++++++++++++++++++ > dir.h | 16 ++++++++++++++++ > refs/files-backend.c | 14 +++++--------- > refs/ref-cache.c | 3 ++- > 5 files changed, 59 insertions(+), 49 deletions(-) > > > base-commit: 3a06386e314565108ad56a9bdb8f7b80ac52fb69 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1594%2Fvdye%2Fvdye%2Fref-iteration-cleanup-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1594/vdye/vdye/ref-iteration-cleanup-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/1594 > -- > gitgitgadget
Attachment:
signature.asc
Description: PGP signature