Applying code layout optimization to postgresql16 RPMs in Fedora 41 gave a 3%-6% improvement in IPC

William Cohen <wcohen@xxxxxxxxxx> · Thu, 21 Nov 2024 17:11:48 -0500

Hi All,

There is always a desire to make code run faster and use less memory.
Code layout optimization is one technique used accomplish those goals
by locating strongly connected functions together to reduce the number
of memory pages containing actively executing code and to reduce the
amount of time that processors use to adjust mapping of logical to
physical memory addresses.  With the addition of binutils-2.43 in
Fedora 41 it is possible to do profile guided optimization (PGO) code
layout with the sediment (*) tool.

As a testcase I collected profiling data on an Intel machine, created
a callgraph from the profiling data, built postgresql16 RPMs with and
without optimizations, and benchmarked the RPMs.  I got between a 3%
and 6% improvment in IPC (instructions per cycle) for pgbench running
in various environments (bare metal x86_64, x86_64 guest VM, and
aarch64), enough of an improvement that people might be interested.
There is a newer version of sediment for Fedora 41 (**) currently in
Bodhi testing that include more details on this postgresql pgbench
experiment in the documentation and the upstream sediment (*) also has
that same information in
https://github.com/wcohen/sediment/blob/master/docs/pop.rst.

Sediment has been designed to work with the RPM build process.
Currently, one needs to use modified RPM macros.  These can be created
quickly by writing the output of the sediment make_sediment_rpmmacros
command into ~/.rpmmacros.  One will also need to define set the pgo
macro to 1 for the rpmbuild process.  The rpm spec file has minimal
modifications.  It has the callgraph files stored as a source file and
a defines the global call_graph to the source file that holds the call
graph.

The approach that sediment uses differs significantly from other code
layout optimization tools such as Intel Thin Layout optimizer (+),
Google propeller (++), and Meta Bolt (+++).  These tools make it
difficult to apply the data collected from one architecture to other
architectures.  Also the data collected is mapped back to line numbers
which will make the data become stale quite quickly as patches or
rebases are done on the software.  Sediment uses the function names
which change less frequently in source code and allow the data from
one run to be still be used for later versions of the software across
multiple architectures.

I would like to see people give sediment a try in Fedora and see what
additional performance improvements on application code can be
obtained.  I would also like to get feedback on how to further improve
sediment.

-Will Cohen

(*) https://github.com/wcohen/sediment
(**) https://bodhi.fedoraproject.org/updates/FEDORA-2024-7f5ea0f053
(+) https://github.com/intel/thin-layout-optimizer/commits/main/
(++) https://github.com/google/llvm-propeller
(+++) https://github.com/llvm/llvm-project/blob/main/bolt/docs/OptimizingLinux.md

-- 
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx
Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue