Re: [PATCH v3 3/4] revision: avoid loading object headers multiple times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 02, 2021 at 12:40:56PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@xxxxxx> writes:
> 
> > When loading references, we try to optimize loading of commits by using
> > the commit graph. To do so, we first need to determine whether the
> > object actually is a commit or not, which is why we always execute
> > `oid_object_info()` first. Like this, we'll unpack the object header of
> > each object first.
> >
> > This pattern can be quite inefficient in case many references point to
> > the same commit: if the object didn't end up in the cached objects, then
> > we'll repeatedly unpack the same object header, even if we've already
> > seen the object before.
> > ...
> > Assuming that in almost all repositories, most references will point to
> > either a tag or a commit, we'd have a modest increase in memory
> > consumption of about 12.5% here.
> 
> I wonder if we can also say almost all repositories, the majority of
> refs point at the same object.  If that holds, this would certainly
> be a win, but otherwise, it is not so clear.

I doubt that's the case in general. I rather assume that it's typically
going to be a smallish subset that points to the same commit, but for
these cases we at least avoid doing the lookup multiple times. As I
said, it's definitely a tradeoff between memory and performance: in the
worst case (all references point to different blobs) we allocate 33%
more memory without having any speedups. A more realistic scenario would
probably be something like a trunk-based development repo, where there's
a single branch only and the rest is tags. There we'd allocate 11% more
memory without any speedups. In general, it's going to be various shades
of gray, where we allocate something from 0% to 11% more memory while
getting some modest speedups in some cases.

So if we only inspect this commit as a standalone it's definitely
debatable whether we'd want to take it or not. But one important thing
is that it's a prerequisite for patch 4/4: in order to not parse commits
in case they're part of the commit-graph, we need to first obtain an
object such that we can fill it in via the graph. So we have to call
`lookup_unknown_object()` anyway. Might be sensible to document this as
part of the commit message.

Patrick

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux