Re: [PATCH] reftable: pass pq_entry by address

Han-Wen Nienhuys <hanwen@xxxxxxxxxx> · Thu, 15 Sep 2022 09:49:44 +0200

On Tue, Sep 13, 2022 at 8:03 PM Elijah Conners <business@xxxxxxxxxxxxxx> wrote:
>
> Han-Wen Nienhuys <hanwen@xxxxxxxxxx> writes:
>  > it might be a bit slower, but "dangerous"? How so?
> In this context, dangerous is the wrong word, but in some cases large objects on the stack can cause stack overflows. In this case, slower is the right word here.

I'll let you paint this bikeshed, but do note that the priority queue
isn't actually optimal here, in a much bigger way. In the typical
case, you'd have

1. large base reftable (created by GC)
2. small updates (created by individual ref updates)

When you're iterating, most of the iteration entries will come from
the large base reftable, and only occasionally, you have to get
entries from the small tables.
In this scenario, the current code will insert entries from the large
table into the priority queue, have it filter up to the top at cost
log(number-of-tables), for each of the entries to be read.

With the current online compaction, number-of-tables =
log(number-of-refs), so at log(log(number-of-refs)) it's not a huge
cost, but certainly larger than the cost of copying the entry once in
the function.

JGit has an optimization here where it tries to get the next entry
from the table that previously provided the minimum entry. I didn't
implement it for simplicity's sake, but if you care about performance,
you might want to try your hand at that.

-- 
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Liana Sebastian