On Thu, Feb 13, 2025 at 03:31:47PM +0100, Patrick Steinhardt wrote: > This turns out to be harder to implement than anticipated. While > iterating through refnames and the ref iterator in tandem sounds nice, > it would cause problems when the ref iterator yields millions of refs. > You don't want to fully iterate through all of them. > > What we really want to do is to reuse the iterator and have it skip > entries: we'd basically create the iterator and re-seek it for every > refname we want to check for collisions. This allows us to reuse some of > the data structures, and in the best case the underlying backend knows > to optimize. Not having looked at the reftable code much, I'll take your word that it isn't easy. ;) I suspect the files backend isn't very good at this either. It knows to take a starting point in ref_iterator_begin(), and should binary-search the packed-refs file to the right point. But if you want to then ask it "OK, now skip ahead until you see entry 'foo'", it would just walk linearly to get there. I.e., there is no real seeking. So I guess the best it could do is restart that same binary-search, and the order in which we feed the refnames is not even really important. And I don't know how seeking works with reftables, since "skip ahead" requires merging all of the tables. > But the bigger problem is that the generic reftable iterator does not > yet have this capability. We first have to revamp their lifetime in the > same way that I revamped the lifetime of reftable iterators. A generic > iterator that has hit its end is currently getting free'd immediately, > which I always found to be a bit awkward. But because of this it's > impossible to reseek them, as they have lost all their state. I do think if you feed the refnames in sorted order your seeks would always be forward. So if you hit the end of iteration, there should be nothing left to check. > Oh, well, I guess that's what I'll be working on now then. But please, > somebody stop me if I'm not seeing the forest for the trees. No, I think it's actually a hard problem. :) -Peff