Re: [PATCH v2 16/16] refs: reuse iterators when determining refname availability

shejialuo <shejialuo@xxxxxxxxx> · Mon, 24 Feb 2025 23:14:00 +0800

On Wed, Feb 19, 2025 at 02:23:43PM +0100, Patrick Steinhardt wrote:
> When verifying whether refnames are available we have to verify whether
> any reference exists that is nested under the current reference. E.g.
> given a reference "refs/heads/foo", we must make sure that there is no
> other reference "refs/heads/foo/*".
> 
> This check is performed using a ref iterator with the prefix set to the
> nested reference namespace. Until now it used to not be possible to
> reseek iterators, so we always had to reallocate the iterator for every
> single reference we're about to check. This keeps us from reusing state
> that the iterator may have and that may make it work more efficiently.
> 
> Refactor the logic to reseek iterators. This leads to a sizeable speedup
> with the "reftable" backend:
> 
>     Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):      39.8 ms ±   0.9 ms    [User: 29.7 ms, System: 9.8 ms]
>       Range (min … max):    38.4 ms …  42.0 ms    62 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):      31.9 ms ±   1.1 ms    [User: 27.0 ms, System: 4.5 ms]
>       Range (min … max):    29.8 ms …  34.3 ms    74 runs
> 
>     Summary
>       update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.25 ± 0.05 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> The "files" backend doesn't really show a huge impact:
> 
>     Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
>       Time (mean ± σ):     392.3 ms ±   7.1 ms    [User: 59.7 ms, System: 328.8 ms]
>       Range (min … max):   384.6 ms … 404.5 ms    10 runs
> 
>     Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD)
>       Time (mean ± σ):     387.7 ms ±   7.4 ms    [User: 54.6 ms, System: 329.6 ms]
>       Range (min … max):   377.0 ms … 397.7 ms    10 runs
> 
>     Summary
>       update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran
>         1.01 ± 0.03 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~)
> 
> This is mostly because it is way slower to begin with because it has to
> create a separate file for each new reference, so the milliseconds we
> shave off by reseeking the iterator doesn't really translate into a
> significant relative improvement.

Interesting, because there are many I/O operations which hides the
compute latency. Even though we improve the compute speed, the I/O
operations would still delay the process.

Thanks,
Jialuo