On Mon, Apr 10, 2017 at 09:14:02PM +0000, git@xxxxxxxxxxxxxxxxx wrote: > From: Jeff Hostetler <jeffhost@xxxxxxxxxxxxx> > > Created t/perf/repos/many-files.sh to generate large, but > artificial repositories. I think this is a good direction. In the long run we might want some kind of magic to pull from the "library" of repos when running perf tests, but it's not a big deal to run the script manually and point GIT_PERF_REPO at the result. As a bonus, this should be faster when running perf tests, since we can reuse the built repo when testing each version of Git. > +## This test measures the performance of various read-tree > +## and checkout operations. It is primarily interested in > +## the algorithmic costs of index operations and recursive > +## tree traversal -- and NOT disk I/O on thousands of files. > +## Therefore, it uses sparse-checkout to avoid populating > +## the ballast files. > +## > +## It expects the test repo to have certain characteristics. > +## Branches: > +## () master := an arbitrary commit. > +## () ballast := an arbitrary commit with a large number > +## of changes relative to "master". > +## () ballast-alias := a branch pointing to the same commit > +## as "ballast". > +## () ballast-1 := a commit with a 1 file difference from > +## "ballast". I'm OK with leaving these requirements on the repo in the name of simplicity, though it does make it harder to perf-test against a regular repo. I wonder if we could make reasonable guesses, like: master => HEAD ballast => $(git rev-list HEAD | tail -n 1) ballast-alias => git branch $ballast ballast-1 => HEAD^ That would approximate your conditions in a real-world repository, and it should be easy to make your synthetic one fit the bill exactly. I don't know if you'd want to turn on sparse checkout manually or not when testing a real-world repo. -Peff