Re: Poor performance using reftable with many refs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 13, 2025 at 03:31:47PM +0100, Patrick Steinhardt wrote:

> This turns out to be harder to implement than anticipated. While
> iterating through refnames and the ref iterator in tandem sounds nice,
> it would cause problems when the ref iterator yields millions of refs.
> You don't want to fully iterate through all of them.
> 
> What we really want to do is to reuse the iterator and have it skip
> entries: we'd basically create the iterator and re-seek it for every
> refname we want to check for collisions. This allows us to reuse some of
> the data structures, and in the best case the underlying backend knows
> to optimize.

Not having looked at the reftable code much, I'll take your word that it
isn't easy. ;)

I suspect the files backend isn't very good at this either. It knows to
take a starting point in ref_iterator_begin(), and should binary-search
the packed-refs file to the right point. But if you want to then ask it
"OK, now skip ahead until you see entry 'foo'", it would just walk
linearly to get there. I.e., there is no real seeking. So I guess the
best it could do is restart that same binary-search, and the order in
which we feed the refnames is not even really important.

And I don't know how seeking works with reftables, since "skip ahead"
requires merging all of the tables.

> But the bigger problem is that the generic reftable iterator does not
> yet have this capability. We first have to revamp their lifetime in the
> same way that I revamped the lifetime of reftable iterators. A generic
> iterator that has hit its end is currently getting free'd immediately,
> which I always found to be a bit awkward. But because of this it's
> impossible to reseek them, as they have lost all their state.

I do think if you feed the refnames in sorted order your seeks would
always be forward. So if you hit the end of iteration, there should be
nothing left to check.

> Oh, well, I guess that's what I'll be working on now then. But please,
> somebody stop me if I'm not seeing the forest for the trees.

No, I think it's actually a hard problem. :)

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux