On Thu, Oct 31, 2024 at 10:57:02AM +0000, Peter Robinson wrote: > On Thu, 31 Oct 2024 at 07:32, Jakub Jelinek <jakub@xxxxxxxxxx> wrote: > > > > On Wed, Oct 30, 2024 at 10:46:01PM -0400, Neal Gompa wrote: > > > I know the idea of moving to -O3 has been briefly mentioned before in > > > other contexts when we've discussed uplifting the flags, but it looks > > > like Ubuntu is moving to -O3 for Ubuntu 25.04[1]. Is there a reason > > > why we shouldn't consider doing the same for Fedora Linux 42? > > > > Yes, this is a very bad idea. > > > > -O3 significantly increases code size (and the speed up gains aren't really > > guaranteed), which is highly undesirable when the vast majority of code in > > the distro isn't performance critical, the I-cache footprint is then more > > important. Especially when -O2 in GCC performs some vectorization already > > for years and it is going to be used more at -O2 in GCC 15 (but the > > heuristics for -O2 vectorization is avoid significant code increases). > > > > -O3 should be used just for performance critical code, which is found to be > > hot in profiling and proved to help performance of the code. > > Better yet, performance criticial code should use PGO (profile guided > > optimizations) so that only the hot parts of the code are automatically > > optimized for speed and cold parts for size. With reasonable workload used > > during package build for the profile feedback generation. > > > > There are some packages in the distro which use PGO (e.g. gcc itself), but > > I think e.g. SUSE packages use PGO far more often than in Fedora. > > How does SUSE choose to use PGO? Is that a manual process where the > maintainer chooses to turn it on for a particular library or binary? I think so. It needs some package specific know-how. Basically, one needs to build the binaries and/or libraries with -fprofile-generate flag in addition to normal compiler flags, then run the binaries with the libraries on some typical workload and finally rebuild again with -fprofile-use flag instead of -fprofile-generate. It can be combined with LTO and generally LTO+PGO results in better speedups over just one of those separately. Testsuites often aren't very good workloads because they usually spend more time on unlikely corner cases than on the common cases, but can be used when there isn't anything better. E.g. GCC when building itself has as the workload building itself, so is trained mostly on C++ code and far less on C code, for shells I'd imagine one should run a few configure scripts with the newly built shell, etc. I guess one could look at SUSE debuginfo packages and see what packages have been built with -fprofile-use and learn from their spec files (if the corresponding packages in Fedora already don't use that). Where PGO isn't ideal is multi-versioned code where the binaries/libraries decide based on the build box hw capabilities on if one or another code should be used; in that case depending on what the build box hw capabilities were, some code can look unused even if it could be hot on other hw. Jakub -- _______________________________________________ devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx Do not reply to spam, report it: https://pagure.io/fedora-infrastructure/new_issue