Re: Building with PGO: concurrency and test data

Jeff King <peff@xxxxxxxx> · Tue, 23 Apr 2024 18:42:32 -0400

On Sun, Apr 21, 2024 at 02:52:48AM +0200, intelfx@xxxxxxxxxxxx wrote:

> 1. The INSTALL doc says that the profiling pass has to run the test
> suite using a single CPU, and the Makefile `profile` target also
> encodes this rule:
> 
> > As a caveat: a profile-optimized build takes a *lot* longer since the
> > git tree must be built twice, and in order for the profiling
> > measurements to work properly, ccache must be disabled and the test
> > suite has to be run using only a single CPU. <...>
> ( https://github.com/git/git/blob/master/INSTALL#L54-L59 )
> [...]
> However, some cursory searching tells me that gcc is equipped to handle
> concurrent runs of an instrumented program:

That text was added quite a while ago, in f2d713fc3e (Fix build problems
related to profile-directed optimization, 2012-02-06). It may be that it
was a problem back then, but isn't anymore.

+cc the author of that commit; I don't know offhand how many people
use "make profile" (now or back then).

> 2. The performance test suite (t/perf/) uses up to two git repositories
> ("normal" and "large") as test data to run git commands against. Does
> the internal organization of these repositories matter? I.e., does it
> matter if those are "real-world-used" repositories with overlapping
> packs, cruft, loose objects, many refs etc., or can I simply use fresh
> clones of git.git and linux.git without loss of profile quality?

I'd be surprised if the choice of repository didn't have some impact.
After all, if there are no loose objects, then the routines that
interact with them are not going to get a lot of exercise. But how much
does it actually matter in practice? I think you'd have to do a bunch of
trial and error measurements to find out.

My gut is that "larger is better" to emphasize the hot loops, but even
that might not be true. The main reason we want "large" repos in some
perf scripts is that it makes it easier to measure the thing we are
speeding up versus the overhead of starting processes, etc. But PGO
might not be as sensitive to that, if it can get what it needs from a
smaller number of runs of the sensitive spots.

All of which is to say "no idea". I know that's not very satisfying, but
I don't recall anybody really discussing PGO much here in the last
decade, so I think you're largely on your own.

-Peff