Hello Pinku, in short, yes, it is possible. You will first have to compile your code with -fprofile-generate. Then you have to run your application with a training set of data, and afterwards recompile it a second time with -fprofile-use to make use of the generated profile. The idea is to create an application, which records a profile of itself when run, feed it with enough data to be representable for most if not all use cases and to get a meaningful profile of your application's run-time behaviour, and then to use the gathered profile to create a highly optimized application. Ideally should the training set be short, because the code used to generate a profile runs slowly as it now has to capture and save a profile as it runs. You can specify a directory where to store and also where to find the profile data later. I.e. for your first compilation do you use: mkdir $HOME/gcda-storage/ gcc -fprofile-generate=$HOME/gcda-storage/ ... And in your second compilation: gcc -fprofile-use=$HOME/gcda-storage/ -O3 -flto ... Should the code by multi-threaded do you want to use the option -fprofile-update=atomic in addition, because it makes sure that the profile is written out in a thread-safe manner. For best results should you compile the second time with link-time optimization (-flto), because link-time optimization benefits quite a lot from profiling data. It's usually better to use PGO over AutoFDO, and to use AutoFDO when PGO isn't possible or impractical. Cheers, Sven On 10/05/2019 21:40, Pinku Surana wrote:
Hi, I have a very large application similar to SciPy. There are many shared libraries loaded dynamically. I’d like to use PGO/AutoFDO to optimize the compilation of these libraries. Is this possible? I repeatedly run simulations that take hundreds of cpu hours. It would be worth the effort to squeeze 10% more performance from these libraries for this particular task. Statically compiling everything is not likely, sadly. Thanks.