On Mon, Mar 22, 2021 at 03:40:46PM +0100, Han-Wen Nienhuys wrote: > > I left some numbers in another part of the thread, but IMHO performance > > isn't that compelling a reason to do this these days, if you are using > > commit-graphs. > > > > Just walking the reflog might be _slightly_ faster, though not > > necessarily (it depends on whether the depth of the object graph or the > > depth of the reflog chain is deeper). It might matter more if you are > > using a more exotic storage scheme, where switching from accessing > > reflogs to objects implies extra round-trips to a server (e.g., custom > > storage backends with JGit; I don't know the state of the art in what > > Google is doing there). > > JGit doesn't currently support commit-graph, so it's hard to predict > what performance will be like, but isn't commit-graph is keyed by > SHA1? That makes it hard to do caching, especially when considering > large repositories. Yes, it's keyed by sha1. It's essentially replacing "inflate the commit object and parse it" with "here are the parsed values as mmap-able 32-bit integer fields" (there's some other stuff with generation numbers, too, but the main speedup is simply that accessing each commit is orders of magnitude cheaper). It caches well, because those properties of the commit are immutable. But if you meant "when pulling data from the commit-graph file, is it friendly to block cache", then no, it's not linear. You'd binary search within it to find each commit, just as you would a pack .idx (and just like a .idx, I'd expect a system that is pulling data from a network source to want to grab the whole commit-graph file. They tend to be much smaller than the main .idx for a given repo). > AFAIU, commit-graph would help speed up reachability checks, by being > able to shortcut cases where the commit number proves that some commit > is not ancestor of the other, but you still have to do a revwalk to > conclusively prove reachability. Right. You'll still walk a lot of the commits, but you'll do so much faster (the generation numbers can also help prune some uninteresting side paths, but again, I think the main value for this operation is just getting the parent info much faster). -Peff