Re: Poor performance using reftable with many refs

Patrick Steinhardt <ps@xxxxxx> · Thu, 13 Feb 2025 14:21:24 +0100

On Thu, Feb 13, 2025 at 10:27:39AM +0100, Christian Couder wrote:
> On Thu, Feb 13, 2025 at 8:13 AM Patrick Steinhardt <ps@xxxxxx> wrote:
> 
> > We end up with two tables: the first one has been created when cloning
> > the repository and contains all references. The second one has been
> > created when deleting all references, so it only contains ref deletions.
> > Because deletions don't have to carry an object ID, the resulting table
> > is also much smaller. This has the effect that auto-compaction does not
> > kick in, because we see that the geometric sequence is still intact.
> 
> Not that I think we should work on this right now, but theoretically,
> could we "just" count the number of entries in each file and base the
> geometric sequence on the number of entries in each file instead of
> file size?

In theory we could, and that may lead to better results in edge cases
like these indeed. And I think if either the header or footer of
reftables contained a total count of contained records that might have
been a viable thing to do indeed. But they don't, so we'd have to open
and parse every complete reftable to do so.

Because of that I think the cost of this would ultimately outweight the
benfit. After all, this logic kicks in on every write to determine if we
need to auto-compact. As a result, it needs to remain cheap.

Patrick