Re: Optimize shared libraries with PGO or AutoFDO?

"Sven C. Dack" <sdack@xxxxxxx> · Sat, 11 May 2019 18:29:00 +0100

Hello Pinku,

in short, yes, it is possible.

You will first have to compile your code with -fprofile-generate. Then
you have to run your application with a training set of data, and
afterwards recompile it a second time with -fprofile-use to make use of
the generated profile.

The idea is to create an application, which records a profile of itself
when run, feed it with enough data to be representable for most if not
all use cases and to get a meaningful profile of your application's
run-time behaviour, and then to use the gathered profile to create a
highly optimized application. Ideally should the training set be short,
because the code used to generate a profile runs slowly as it now has to
capture and save a profile as it runs.

You can specify a directory where to store and also where to find the
profile data later. I.e. for your first compilation do you use:

mkdir $HOME/gcda-storage/
gcc -fprofile-generate=$HOME/gcda-storage/ ...

And in your second compilation:

gcc -fprofile-use=$HOME/gcda-storage/ -O3 -flto ...

Should the code by multi-threaded do you want to use the option
-fprofile-update=atomic in addition, because it makes sure that the
profile is written out in a thread-safe manner.

For best results should you compile the second time with link-time
optimization (-flto), because link-time optimization benefits quite a
lot from profiling data.

It's usually better to use PGO over AutoFDO, and to use AutoFDO when PGO
isn't possible or impractical.

Cheers,

Sven

On 10/05/2019 21:40, Pinku Surana wrote:
Hi,

I have a very large application similar to SciPy. There are many shared libraries loaded dynamically. I’d like to use PGO/AutoFDO to optimize the compilation of these libraries. Is this possible?

I repeatedly run simulations that take hundreds of cpu hours. It would be worth the effort to squeeze 10% more performance from these libraries for this particular task. Statically compiling everything is not likely, sadly.

Thanks.