On Thu, Feb 13, 2025 at 12:20:03PM +0100, Patrick Steinhardt wrote: > On Thu, Feb 13, 2025 at 03:22:21AM -0500, Jeff King wrote: > > On Thu, Feb 13, 2025 at 08:13:38AM +0100, Patrick Steinhardt wrote: > > > > > Turns out that you're hitting quite a funny edge case: the issue comes > > > from you first deleting all preexisting refs in the target repository > > > before recreating them. With "packed-refs", this leads to a repository > > > that has neither a "packed-refs" file nor any loose ref, except for HEAD > > > of course. But with "reftables" it doesn't: > > > > > > total 368 > > > -rw-r--r-- 1 pks users 332102 Feb 13 08:00 0x000000000001-0x000000000001-d8285c7c.ref > > > -rw-r--r-- 1 pks users 32941 Feb 13 08:00 0x000000000002-0x000000000003-f1a8ebf9.ref > > > -rw-r--r-- 1 pks users 86 Feb 13 08:00 tables.list > > > > > > We end up with two tables: the first one has been created when cloning > > > the repository and contains all references. The second one has been > > > created when deleting all references, so it only contains ref deletions. > > > Because deletions don't have to carry an object ID, the resulting table > > > is also much smaller. This has the effect that auto-compaction does not > > > kick in, because we see that the geometric sequence is still intact. And > > > consequently, all the checks that we perform when recreating the refs > > > are way more expensive now because we have to search for conflicts. > > > > That makes sense. But that's only 360k of reftables. Why does it take so > > long to process? > > > > It's been a while since I looked at reftables, but I'd think for a > > normal lookup we should be able to binary-search or similar in each > > table, find the relevant entries, and be done. > > > > But I guess we can't easily do that for finding write conflicts, because > > writing "foo/bar" means we need to care about "foo" and "foo/bar/baz" as > > well. Finding "foo" is easy; we just break apart the proposed refname > > and look for each leading path. But "foo/bar/baz" is harder; we have to > > merge the tables to get an authoritative sorted list of the current > > state, and then look for the entries adjacent to where our proposed ref > > goes. Looking at a profiling output, we're spending a lot of time in > > merged_iter_next_void() and its children, which supports that. > > > > But the run-time scales linearly with the number of refs we're adding, > > which implies that we're doing this iteration independently once per ref > > we're adding. Instead, if we're given a list of N refs to write, we > > should be able to sort that list and walk it in parallel with the > > merged_iter result, making a single pass over the lists. > > > > So I guess we'd need a version of refs_verify_refname_available() that > > takes a list of refs rather than a single name. And then you could do > > that single-pass walk. And as a bonus, you'd be able to de-dup the > > leading prefixes you're looking for (e.g., most of your refs will start > > with "refs/heads/", so we only need to check it once). > > Yes, `refs_verify_refname_available()` is exactly the problem. We spend > ~80% of the time in that function after the optimization I have pointed > out for `repo_get_oid()`. I assume that we'd see similar performance for > the "files" backend if we had 360k refs and inserted 360k other refs, > but haven't verified this claim. > > I've already noticed multiple times that this function is a significant > slowdown in lots of cases. I've already started looking at it a bit, and > will think about ways to fix this. This turns out to be harder to implement than anticipated. While iterating through refnames and the ref iterator in tandem sounds nice, it would cause problems when the ref iterator yields millions of refs. You don't want to fully iterate through all of them. What we really want to do is to reuse the iterator and have it skip entries: we'd basically create the iterator and re-seek it for every refname we want to check for collisions. This allows us to reuse some of the data structures, and in the best case the underlying backend knows to optimize. This is something that I have spent a significant time on to implement in the last couple months for the reftable backend. But while we have reseekable iterators there, which already got us a bit of a performance improvement due to more reuse of data structures, we don't yet know to specifically optimize for some specific seeks. We could e.g. easily skip re-reading a block if we already know that it will contain the reference we're searching for. But the bigger problem is that the generic reftable iterator does not yet have this capability. We first have to revamp their lifetime in the same way that I revamped the lifetime of reftable iterators. A generic iterator that has hit its end is currently getting free'd immediately, which I always found to be a bit awkward. But because of this it's impossible to reseek them, as they have lost all their state. Oh, well, I guess that's what I'll be working on now then. But please, somebody stop me if I'm not seeing the forest for the trees. Patrick