On Mon, Aug 02, 2021 at 12:40:56PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@xxxxxx> writes: > > > When loading references, we try to optimize loading of commits by using > > the commit graph. To do so, we first need to determine whether the > > object actually is a commit or not, which is why we always execute > > `oid_object_info()` first. Like this, we'll unpack the object header of > > each object first. > > > > This pattern can be quite inefficient in case many references point to > > the same commit: if the object didn't end up in the cached objects, then > > we'll repeatedly unpack the same object header, even if we've already > > seen the object before. > > ... > > Assuming that in almost all repositories, most references will point to > > either a tag or a commit, we'd have a modest increase in memory > > consumption of about 12.5% here. > > I wonder if we can also say almost all repositories, the majority of > refs point at the same object. If that holds, this would certainly > be a win, but otherwise, it is not so clear. I doubt that's the case in general. I rather assume that it's typically going to be a smallish subset that points to the same commit, but for these cases we at least avoid doing the lookup multiple times. As I said, it's definitely a tradeoff between memory and performance: in the worst case (all references point to different blobs) we allocate 33% more memory without having any speedups. A more realistic scenario would probably be something like a trunk-based development repo, where there's a single branch only and the rest is tags. There we'd allocate 11% more memory without any speedups. In general, it's going to be various shades of gray, where we allocate something from 0% to 11% more memory while getting some modest speedups in some cases. So if we only inspect this commit as a standalone it's definitely debatable whether we'd want to take it or not. But one important thing is that it's a prerequisite for patch 4/4: in order to not parse commits in case they're part of the commit-graph, we need to first obtain an object such that we can fill it in via the graph. So we have to call `lookup_unknown_object()` anyway. Might be sensible to document this as part of the commit message. Patrick
Attachment:
signature.asc
Description: PGP signature