Re: [PATCH 2/3] upload-pack: skip parse-object re-hashing of "want" objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/6/2022 7:05 PM, Jeff King wrote:
> This patch teaches both code paths to use the new SKIP_HASH_CHECK flag
> for parse_object(). You can see the speed-up in p5600, which does a
> blob:none clone followed by a checkout. The savings for git.git are
> modest:
> 
>   Test                          HEAD^             HEAD
>   ----------------------------------------------------------------------
>   5600.3: checkout of result    2.23(4.19+0.24)   1.72(3.79+0.18) -22.9%
> 
> But the savings scale with the number of bytes. So on a repository like
> linux.git with more files, we see more improvement (in both absolute and
> relative numbers):
> 
>   Test                          HEAD^                HEAD
>   ----------------------------------------------------------------------------
>   5600.3: checkout of result    51.62(77.26+2.76)    34.86(61.41+2.63) -32.5%
> 
> And here's an even more extreme case. This is the android gradle-plugin
> repository, whose tip checkout has ~3.7GB of files:
> 
>   Test                          HEAD^               HEAD
>   --------------------------------------------------------------------------
>   5600.3: checkout of result    79.51(90.84+5.55)   40.28(51.88+5.67) -49.3%
> 
> Keep in mind that these timings are of the whole checkout operation.

These numbers are _very_ impressive for this user-facing scenario.

>   Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input
>     Time (mean ± σ):     50.884 s ±  0.239 s    [User: 51.450 s, System: 1.726 s]
>     Range (min … max):   50.608 s … 51.025 s    3 runs
> 
>   Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input
>     Time (mean ± σ):      9.728 s ±  0.112 s    [User: 10.466 s, System: 1.535 s]
>     Range (min … max):    9.618 s …  9.842 s    3 runs
> 
>   Summary
>     'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran
>       5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input'
> 
> So a server would see an 80% reduction in CPU serving the initial
> checkout of a partial clone for this repository.

Very nice!

> In both cases skipping the extra hashing on the server should be pretty
> safe. The client doesn't trust the server anyway, so it will re-hash all
> of the objects via index-pack.

Definitely safe for this target scenario.

> There is one thing to note, though: the
> change in get_reference() affects not just pack-objects, but rev-list,
> git-log, etc. We could use a flag to limit to index-pack here, but we
> may already skip hash checks in this instance. For commits, we'd skip
> anything we load via the commit-graph. And while before this commit we
> would check a blob fed directly to rev-list on the command-line, we'd
> skip checking that same blob if we found it by traversing a tree.

It's worth also mentioning that since you isolated the change to
get_reference(), it is only skipping the parse for the requested tips
of the revision walk. As we follow commits and trees to other objects
we do not use parse_object() but instead use the appropriate method
(parse_commit(), parse_tree(), and do not even parse blobs).

This is additionally confirmed that p0001-rev-list.sh has no measurable
change in results with this commit, even when disabling the commit-graph.

> The exception for both is if --verify-objects is used. In that case,
> we'll skip this optimization, and the new test makes sure we do this
> correctly.

The test for this is clever. Thanks for adding it.

Code and test look good.

Thanks,
-Stolee



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux