Re: [PATCH v2] rev-list --disk-usage

Jeff King <peff@xxxxxxxx> · Wed, 10 Feb 2021 04:38:51 -0500

On Tue, Feb 09, 2021 at 01:14:17PM -0800, Junio C Hamano wrote:

> Jeff King <peff@xxxxxxxx> writes:
> 
> > I don't know that it's really worth digging into that much, though it's
> > quite possible there may be some easy wins by optimizing those memcpy
> > calls. E.g., I'm not sure if the compiler ends up inlining them or not.
> > If it doesn't realize that the_hash_algo->rawsz is only ever "20" or
> > "32", we could perhaps help it along with specialized versions of
> > hashcpy(). If somebody does want to play with it, this patch may make a
> > good testbed. :)
> 
> Yuck.  That reminds me of the adventure Shawn he made in the Java
> land benchmarking which one among int[5], int a,b,c,d,e, char[40] is
> the most efficient way (both storage-wise and performance-wise) to
> store SHA-1 hash.  I wish we didn't have to go there.
> 
> It indeed is an interesting, despite a bit sad, observation that
> even with a good precomputed information, an overly heavy interface
> can kill potential performance benefit.

Agreed. But I'm hoping we can continue to mostly ignore it. I suspect
this finding means we are wasting a few hundred milliseconds copying
oids around during a clone of torvalds/linux. But overall that is a
pretty heavy-weight operation, and I doubt anybody really notices. And
for something as lightweight as --disk-usage, it was easy enough to
optimize around it.

It probably does have a more measurable impact in something like:

  git rev-list --use-bitmap-index --objects HEAD >/dev/null

where we really do need those oids, and the extra copying might add up.
I guess if somebody is interested in micro-optimizing, that is probably
a good command to look at.

-Peff