Re: Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

Miro Hrončok <mhroncok@xxxxxxxxxx> · Sat, 1 Feb 2025 13:49:04 +0100

On 01. 02. 25 2:10, William Cohen wrote:
On 12/5/24 9:45 AM, Miro Hrončok wrote:
On 04. 12. 24 20:32, William Cohen wrote:
On 11/21/24 17:32, Miro Hrončok wrote:
On 21. 11. 24 23:11, William Cohen wrote:
Sediment has been designed to work with the RPM build process.
Currently, one needs to use modified RPM macros.  These can be created
quickly by writing the output of the sediment make_sediment_rpmmacros
command into ~/.rpmmacros.  One will also need to define set the pgo
macro to 1 for the rpmbuild process.  The rpm spec file has minimal
modifications.  It has the callgraph files stored as a source file and
a defines the global call_graph to the source file that holds the call
graph.

Hey Will,

let's say I wan to try this for Python. Where do I start? The README on https://github.com/wcohen/sediment is not very helpful.

This is what I did based on your email:

$ sudo dnf --enable-repo=updates-testing install sediment
...
Installing sediment-0:0.9.3-1.fc41.noarch

I run make_sediment_rpmmacros, it gives me some macros. Now I am supposed to put those to ~/.rpmmacros. Exccept I never build Python loclly, I use Koji or mock. I can probably amend this to use %global and insert it to python3.14.spec. But what else I need to do? Do you have a step by step kind of document I can follow?

Hi Miro,

The tooling doesn't yet fit your work flow of building packages in
koji and mock.  I am looking into ways of addressing that issue.

I an earlier email I mentioned the important thing was have good
profiling data.  Do you have suggestions on some benchmarks that would
properly exercise the python interpreter?  I have used pyperformance
(https://github.com/python/pyperformance) to get some call graph data
for python and added that to a python3.13 srpm available at
https://koji.fedoraproject.org/koji/taskinfo?taskID=126526066. ; Note
Koji is NOT building code layout optimization.  One would still need
to build locally python3.13-3.13.0-1.fc41.src.rpm with sediment-0.9.4
(https://koji.fedoraproject.org/koji/buildinfo?buildID=2596791)
installed and ~/.rpmmacros following steps:

     make_sediment_rpmmacros > ~/.rpmmacros
     rpm -Uvh python3.13-3.13.0-1.fc41.src.rpm
     cd ~/rpmbuild/SPECS
     rpmbuild -ba --define "pgo 1" python3.13.spec

The notable difference in the python3.13.spec file is the addition of:

# Call graph information
SOURCE12: perf_pybenchmark.gv
%global call_graph %{SOURCE12}

The perf_pybenchmark.gv was generated with steps:

     python3 -m pip install pyperformance
     perf record -e branches:u -j any_call -o perf_pybenchmark.data pyperformance run -f -o fc41_x86_python_baseline.json
     perf report -i perf_pybenchmark.data --no-demangle --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out
     perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv

Added the file to the python srpm:

     cp  perf_pybenchmark.gv ~/rpmbuild/SOURCES/.
     # edit ~/rpmbuild/SPECS/python3.13.spec to add call graph info
     The improvements were mixed between the code layout optimized python
and the baseline version of the pyperformance benchmarks.  This can be
seen in the attached python_pgo.out generated by:

     python3 -m pyperf compare_to fc41_x86_python_baseline.json fc41_x86_python_pgo.json --table > python_pgo.out

It looks like a number of the benchmarks are microbenchmarks that are
unlikely the benefit much for the code layout optimizations.

Are there other python performance tests that you would suggest that
have have larger footprint and would better gauge the possible
performance improvement from the code layout optimization?

Are there better python code examples to collect profiling data on?
Hey Will,

thanks for looking into this.

For your question: Upstream is using this for PGO:

   $ python3.14 -m test --pgo

Or:

   $ python3.14 -m test --pgo-extended

In spec, this can be used:

   LD_LIBRARY_PATH=./build/optimized ./build/optimized/python -m test ...

---

What is the blocker to run this in Koji/mock?

You do `make_sediment_rpmmacros > ~/.rpmmacros`.

What's the issue with %defining such macros at spec level?

Hi,

I was able to do some experiments with the koji/mock buildable python3.13-3.13.0-1.fc41_opt.src.rpm (https://koji.fedoraproject.org/koji/taskinfo?taskID=128437060) and get better measurements of the performance impact With vstinner's suggestions for doing profiling of python. On a Lenovo P51 laptop running Fedora 41 I built two versions of rpms. Training data collected on pyperformance run and analyzed using sediment tool with:

    python3 -m pip install pyperformance
    perf record -e branches:u -j any_call -o perf_pybenchmark.data pyperformance run -f -o fc41_x86_python_baseline.json
    perf report -i perf_pybenchmark.data --no-demangle --sort=comm,dso_from,symbol_from,dso_to,symbol_to > perf_pybenchmark.out
    perf2gv < perf_pybenchmark.out > perf_pybenchmark.gv

Installed the srpm, went into the SPECS directory, and built the code layout optimized RPMs (have an added _opt in the names) with:

    rpm -Uvh python3.13-3.13.0-1.fc41_opt.src.rpm
    cd ~/rpmbuild/SPECS
    rpmbuild -ba python3.13.spec

Built RPMs without the code-layout optimization (no _opt in the RPM names):

   rpmbuild --without opt -ba python3.13.spec

Installed the code-layout RPMs, set up the environment for benchmarking, and ran the tests:

   sudo dnf install ~/rpmbuild/RPMS/x86_64/python*fc41_opt* ~/rpmbuild/RPMS/noarch/python-unversioned-command-3.13.0-1.fc41_opt.noarch.rpm
   sudo python3 -m pyperf system tune
   pyperformance run -f -o fc41_x86_python_opt20250131.json >& fc41_pybench_opt_20250131.log

Then collected data for the non-optimized version of the rpms:

sudo dnf install ~/rpmbuild/RPMS/x86_64/python*fc41.* ~/rpmbuild/RPMS/noarch/python-unversioned-command-3.13.0-1.fc41.noarch.rpm
sudo python3 -m pyperf system tune
pyperformance run -f -o fc41_x86_python_20250131.json >& fc41_pybench_20250131.log

Once done compared the data between the runs with:

  python3 -m pyperf compare_to fc41_x86_python_20250131.json  fc41_x86_python_opt20250131.json --table > python_opt.out

Below is the comparison between the two versions python_opt.out). For the vast majority of the benchmarks the optimized code is slightly faster typically (1%).  The regex_* benchmarks appeared to be the largest benefit with regex_dna being 1.04x faster. There are several benchmarks that are slightly slower, pickle, pickle_dict, create_gc_cycles, spectral_norm, and typing_runtime_protocols.  The unpack_sequence was the worst, being 1.12x slower for the optimized code.  The improvements are not as noticeable as what was seen with postgresql.  I suspect that this might be due to the pyperformance has microbenchmarks and is not putting as much pressure on the iTLB as the large postgresql binary.

Thank you, Will!

I've CC'ed Charalampos, who is now looking into Python performance in Fedora+EL.

--
Miro Hrončok
--
Phone: +420777974800
Fedora Matrix: mhroncok

--
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue