On 9/6/2022 7:05 PM, Jeff King wrote: > This patch teaches both code paths to use the new SKIP_HASH_CHECK flag > for parse_object(). You can see the speed-up in p5600, which does a > blob:none clone followed by a checkout. The savings for git.git are > modest: > > Test HEAD^ HEAD > ---------------------------------------------------------------------- > 5600.3: checkout of result 2.23(4.19+0.24) 1.72(3.79+0.18) -22.9% > > But the savings scale with the number of bytes. So on a repository like > linux.git with more files, we see more improvement (in both absolute and > relative numbers): > > Test HEAD^ HEAD > ---------------------------------------------------------------------------- > 5600.3: checkout of result 51.62(77.26+2.76) 34.86(61.41+2.63) -32.5% > > And here's an even more extreme case. This is the android gradle-plugin > repository, whose tip checkout has ~3.7GB of files: > > Test HEAD^ HEAD > -------------------------------------------------------------------------- > 5600.3: checkout of result 79.51(90.84+5.55) 40.28(51.88+5.67) -49.3% > > Keep in mind that these timings are of the whole checkout operation. These numbers are _very_ impressive for this user-facing scenario. > Benchmark 1: GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input > Time (mean ± σ): 50.884 s ± 0.239 s [User: 51.450 s, System: 1.726 s] > Range (min … max): 50.608 s … 51.025 s 3 runs > > Benchmark 2: GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input > Time (mean ± σ): 9.728 s ± 0.112 s [User: 10.466 s, System: 1.535 s] > Range (min … max): 9.618 s … 9.842 s 3 runs > > Summary > 'GIT_PROTOCOL=version=2 git.new upload-pack ../gradle-plugin <input' ran > 5.23 ± 0.07 times faster than 'GIT_PROTOCOL=version=2 git.old upload-pack ../gradle-plugin <input' > > So a server would see an 80% reduction in CPU serving the initial > checkout of a partial clone for this repository. Very nice! > In both cases skipping the extra hashing on the server should be pretty > safe. The client doesn't trust the server anyway, so it will re-hash all > of the objects via index-pack. Definitely safe for this target scenario. > There is one thing to note, though: the > change in get_reference() affects not just pack-objects, but rev-list, > git-log, etc. We could use a flag to limit to index-pack here, but we > may already skip hash checks in this instance. For commits, we'd skip > anything we load via the commit-graph. And while before this commit we > would check a blob fed directly to rev-list on the command-line, we'd > skip checking that same blob if we found it by traversing a tree. It's worth also mentioning that since you isolated the change to get_reference(), it is only skipping the parse for the requested tips of the revision walk. As we follow commits and trees to other objects we do not use parse_object() but instead use the appropriate method (parse_commit(), parse_tree(), and do not even parse blobs). This is additionally confirmed that p0001-rev-list.sh has no measurable change in results with this commit, even when disabling the commit-graph. > The exception for both is if --verify-objects is used. In that case, > we'll skip this optimization, and the new test makes sure we do this > correctly. The test for this is clever. Thanks for adding it. Code and test look good. Thanks, -Stolee