Re: [PATCH v3 5/8] reftable/record: store "val1" hashes as static arrays

Patrick Steinhardt <ps@xxxxxx> · Tue, 6 Feb 2024 07:03:18 +0100

On Mon, Feb 05, 2024 at 03:39:31AM -0800, Karthik Nayak wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > When reading ref records of type "val1", we store its object ID in an
> > allocated array. This results in an additional allocation for every
> > single ref record we read, which is rather inefficient especially when
> > iterating over refs.
> >
> > Refactor the code to instead use an embedded array of `GIT_MAX_RAWSZ`
> > bytes. While this means that `struct ref_record` is bigger now, we
> > typically do not store all refs in an array anyway and instead only
> > handle a limited number of records at the same point in time.
> >
> > Using `git show-ref --quiet` in a repository with ~350k refs this leads
> > to a significant drop in allocations. Before:
> >
> >     HEAP SUMMARY:
> >         in use at exit: 21,098 bytes in 192 blocks
> >       total heap usage: 2,116,683 allocs, 2,116,491 frees, 76,098,060 bytes allocated
> >
> > After:
> >
> >     HEAP SUMMARY:
> >         in use at exit: 21,098 bytes in 192 blocks
> >       total heap usage: 1,419,031 allocs, 1,418,839 frees, 62,145,036 bytes allocated
> 
> Curious, did you also do perf benchmarking on this?

I didn't back then, but here you go. The following test shows a single
ref matching a specific pattern out of 1 million refs:

    Benchmark 1: show-ref: single matching ref (revision = HEAD~)
      Time (mean ± σ):     191.1 ms ±   5.2 ms    [User: 188.1 ms, System: 2.8 ms]
      Range (min … max):   186.2 ms … 214.5 ms    100 runs

    Benchmark 2: show-ref: single matching ref (revision = HEAD)
      Time (mean ± σ):     189.7 ms ±   5.3 ms    [User: 186.7 ms, System: 2.8 ms]
      Range (min … max):   184.1 ms … 213.4 ms    100 runs

    Summary
      show-ref: single matching ref (revision = HEAD) ran
        1.01 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Not much of a win here, which is probably expected. On glibc the
allocator seems to be really efficient churning out many small blocks of
memory, which is also something I have noticed in other contexts. I do
expect that other platorms might see more significant results.

Patrick
Attachment:
signature.asc

Description: PGP signature