Re: Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

William Cohen <wcohen@xxxxxxxxxx> · Thu, 21 Nov 2024 18:59:59 -0500

On 11/21/24 17:32, Miro Hrončok wrote:
> On 21. 11. 24 23:11, William Cohen wrote:
>> Sediment has been designed to work with the RPM build process.
>> Currently, one needs to use modified RPM macros.  These can be created
>> quickly by writing the output of the sediment make_sediment_rpmmacros
>> command into ~/.rpmmacros.  One will also need to define set the pgo
>> macro to 1 for the rpmbuild process.  The rpm spec file has minimal
>> modifications.  It has the callgraph files stored as a source file and
>> a defines the global call_graph to the source file that holds the call
>> graph.
> 
> Hey Will,
> 
> let's say I wan to try this for Python. Where do I start? The README on https://github.com/wcohen/sediment is not very helpful.
> 
> This is what I did based on your email:
> 
> $ sudo dnf --enable-repo=updates-testing install sediment
> ...
> Installing sediment-0:0.9.3-1.fc41.noarch
> 
> I run make_sediment_rpmmacros, it gives me some macros. Now I am supposed to put those to ~/.rpmmacros. Exccept I never build Python loclly, I use Koji or mock. I can probably amend this to use %global and insert it to python3.14.spec. But what else I need to do? Do you have a step by step kind of document I can follow?
> 

Hi, Miro,

For the time being the builds need to have the macros defined, so koji isn't going to work.  You might be able to modify the RPM that provides the macros and have a mock environment that used that special RPM.

It might be more productive to take some time to define what would provide good representative data for python.  If the profiling data is not representative, then the code layout changes are unlikely to provide much performance improvement.  For the postgresql16 example the training was done on pgbench.  The postgres binary is pretty large, 9.4MB in size and there appears to be a fair amount of jumping around in the code.  The code layout optimization reduce that.  For python I was thinking that might be able to train on something that does python benchmarking.  The microbenchmark of tight loops might not show much improvement, but maybe there could be some other applications that could be used to compare the normal vs optimized.

So far I have only collected data on x86_64 bare metal machines using something like the following(you might need to adjust the kernel setting to allow a normal user to collect data) :

  perf record -e branches:u -j any_call python_executable_under_test

Once you have the data you can convert into a form that sediment can use with:

  perf report --no-demangle --sort=comm,dso_from,symbol_from,dso_to,symbol_to > python_pgo_data.out

Then convert it into the actual call graph file with:

  perf2gv < python_pgo_data.out > python_pgo_data.gv

The python_pgo_data.gv file would be the file used for the python creation.  You can also convert it into something more human readable that can be viewed in the browser with:

  dot -Tsvg -o python_pgo_data.svg python_pgo_data.gv

Taking a look at the svg file might give some ideas what hare the hot and cold paths in the code.   The ovals are the individual functions (there could be inlined functions in those, but we don't care about those).  The directed edge mark calls between the different functions.  Each edge is also labelled with a relative probability of edge being taken.  The edge weights are used by sediment to figure out which functions should be grouped together.  All the edges added together should add up to 1.  This normalization makes it a bit easier to combine multiple callgraphs together.  The large rectangle around a group of functions is the binary name.  I expect that you are going to be looking at the shared libary libpython3.14.so.1.0 for optimization.

Once you have a python_pgo_data.gv file.  Create the .rpmmacros file:

  make_sediment_rpmmacros > ~/.rpmmacros

Edit the python.spec file to include the python_pgo_data.gv file :

  SOURCE99: python_pgo_data.gv
  %global call_graph %{SOURCE99}

Then build it with 

  rpmbuild -ba --define "pgo 1" python3.14.spec

This should generate RPMs with _pgo in the RPM name.
To build baseline examples without the layout optimization:

  rpmbuild -ba python3.14.spec

I certainly would like to help getting optimized version of python built and am happy to help work through the issues. Let me know if you have any other questions or if there are things that I can improve.  

-Will

-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue