On Tue, Jun 14, 2022 at 08:35:16PM -0400, Taylor Blau wrote: > On Tue, Jun 14, 2022 at 01:27:18PM -0400, Derrick Stolee wrote: > > > Did you have any other sort of performance test in mind? The remotes we > > > typically deal with are geographically far away and deal with a high volume > > > of traffic so we're keen to move behaviour to the client where it makes sense > > > to do so. > > > > I guess I wonder how large your promisor pack-files are in this test, > > since your implementation depends on for_each_packed_object(), which > > should be really inefficient if you're actually dealing with a large > > partial clone. > > I had the same thought. Storing data available in the promisor packs > into an oid_map is going to be expensive if there are many such objects. > > Is there a reason that we can't introduce a variant of > find_kept_pack_entry() that deals only with .promisor packs and look > these things up as-needed? It's much worse than that. The promisor mechanism is fundamentally very inefficient in runtime, optimizing instead for size. Imagine I have a partial clone and I retrieve tree X, which points to a blob Y that I don't get. I have X in a promisor pack, and asking about it is efficient. But if I want to know about Y, I have no data structure mentioning Y except the tree X itself. So to enumerate all of the promisor edges, I have to walk all of the trees in the promisor pack. So it is not just lookup, but actual tree walking that is expensive. The flip side is that you don't have to store a complete separate list of the promised objects. Whether that's a win depends on how many local objects you have, versus how many are promised. But it would be possible to cache the promisor list to make the tradeoff separately. E.g., do the walk over the promisor trees once (perhaps at pack creation time), and store a sorted list of fixed-length (oid, type) records that could be binary searched. You could even put it in the .promisor file. :) -Peff