Add the build support for using Clang's Propeller optimizer. Like AutoFDO, Propeller uses hardware sampling to gather information about the frequency of execution of different code paths within a binary. This information is then used to guide the compiler's optimization decisions, resulting in a more efficient binary. The support requires a Clang compiler LLVM 19 or later, and the create_llvm_prof tool (https://github.com/google/autofdo/releases/tag/v0.30.1). This submission is limited to x86 platforms that support PMU features like LBR on Intel machines and AMD Zen3 BRS. For Arm, we plan to send patches for SPE-based Propeller when AutoFDO for Arm is ready. Here is an example workflow for building an AutoFDO+Propeller optimized kernel: 1) Build the kernel on the HOST machine, with AutoFDO and Propeller build config CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y then $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> “<autofdo_profile>” is the profile collected when doing a non-Propeller AutoFDO build. This step builds a kernel that has the same optimization level as AutoFDO, plus a metadata section that records basic block information. This kernel image runs as fast as an AutoFDO optimized kernel. 2) Install the kernel on test/production machines. 3) Run the load tests. The '-c' option in perf specifies the sample event period. We suggest using a suitable prime number, like 500009, for this purpose. For Intel platforms: $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ -o <perf_file> -- <loadtest> For AMD platforms: The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 # To see if Zen3 support LBR: $ cat proc/cpuinfo | grep " brs" # To see if Zen4 support LBR: $ cat proc/cpuinfo | grep amd_lbr_v2 # If the result is yes, then collect the profile using: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ -N -b -c <count> -o <perf_file> -- <loadtest> 4) (Optional) Download the raw perf file to the HOST machine. 5) Generate Propeller profile: $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ --format=propeller --propeller_output_module_name \ --out=<propeller_profile_prefix>_cc_profile.txt \ --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt “create_llvm_prof” is the profile conversion tool, and a prebuilt binary for linux can be found on https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build from source). "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". This command generates a pair of Propeller profiles: "<propeller_profile_prefix>_cc_profile.txt" and "<propeller_profile_prefix>_ld_profile.txt". 6) Rebuild the kernel using the AutoFDO and Propeller profile files. CONFIG_AUTOFDO_CLANG=y CONFIG_PROPELLER_CLANG=y and $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \ CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> Co-developed-by: Han Shen <shenhan@xxxxxxxxxx> Signed-off-by: Han Shen <shenhan@xxxxxxxxxx> Signed-off-by: Rong Xu <xur@xxxxxxxxxx> Suggested-by: Sriraman Tallam <tmsriram@xxxxxxxxxx> Suggested-by: Krzysztof Pszeniczny <kpszeniczny@xxxxxxxxxx> Suggested-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> Suggested-by: Stephane Eranian <eranian@xxxxxxxxxx> --- Documentation/dev-tools/index.rst | 1 + Documentation/dev-tools/propeller.rst | 188 ++++++++++++++++++++++++++ MAINTAINERS | 7 + Makefile | 1 + arch/Kconfig | 22 +++ arch/x86/Kconfig | 1 + arch/x86/boot/compressed/Makefile | 3 + arch/x86/kernel/vmlinux.lds.S | 4 + arch/x86/platform/efi/Makefile | 1 + drivers/firmware/efi/libstub/Makefile | 2 + include/asm-generic/vmlinux.lds.h | 8 +- scripts/Makefile.lib | 10 ++ scripts/Makefile.propeller | 25 ++++ tools/objtool/check.c | 1 + 14 files changed, 270 insertions(+), 4 deletions(-) create mode 100644 Documentation/dev-tools/propeller.rst create mode 100644 scripts/Makefile.propeller diff --git a/Documentation/dev-tools/index.rst b/Documentation/dev-tools/index.rst index 46636e4efe15..16e33eadb73b 100644 --- a/Documentation/dev-tools/index.rst +++ b/Documentation/dev-tools/index.rst @@ -33,6 +33,7 @@ Documentation/dev-tools/testing-overview.rst ktap checkuapi autofdo + propeller .. only:: subproject and html diff --git a/Documentation/dev-tools/propeller.rst b/Documentation/dev-tools/propeller.rst new file mode 100644 index 000000000000..15ef0e6d973e --- /dev/null +++ b/Documentation/dev-tools/propeller.rst @@ -0,0 +1,188 @@ +.. SPDX-License-Identifier: GPL-2.0 + +===================================== +Using Propeller with the Linux kernel +===================================== + +This enables Propeller build support for the kernel when using Clang +compiler. Propeller is a profile-guided optimization (PGO) method used +to optimize binary executables. Like AutoFDO, it utilizes hardware +sampling to gather information about the frequency of execution of +different code paths within a binary. Unlike AutoFDO, this information +is then used right before linking phase to optimize (among others) +block layout within and across functions. + +A few important notes about adopting Propeller optimization: + +#. Although it can be used as a standalone optimization step, it is + strongly recommended to apply Propeller on top of AutoFDO, + AutoFDO+ThinLTO or Instrument FDO. The rest of this document + assumes this paradigm. + +#. Propeller uses another round of profiling on top of + AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves + "build-afdo - train-afdo - build-propeller - train-propeller - + build-optimized". + +#. Propeller requires LLVM 19 release or later for Clang/Clang++ + and the linker(ld.lld). + +#. In addition to LLVM toolchain, Propeller requires a profiling + conversion tool: https://github.com/google/autofdo with a release + after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1. + +The Propeller optimization process involves the following steps: + +#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as + you would normally do, but with a set of compile-time / link-time + flags, so that a special metadata section is created within the + kernel binary. The special section is only intend to be used by the + profiling tool, it is not part of the runtime image, nor does it + change kernel run time text sections. + +#. Profiling: The above kernel is then run with a representative + workload to gather execution frequency data. This data is collected + using hardware sampling, via perf. Propeller is most effective on + platforms supporting advanced PMU features like LBR on Intel + machines. This step is the same as profiling the kernel for AutoFDO + (the exact perf parameters can be different). + +#. Propeller profile generation: Perf output file is converted to a + pair of Propeller profiles via an offline tool. + +#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized + binary as you would normally do, but with a compile-time / + link-time flag to pick up the Propeller compile time and link time + profiles. This build step uses 3 profiles - the AutoFDO profile, + the Propeller compile-time profile and the Propeller link-time + profile. + +#. Deployment: The optimized kernel binary is deployed and used + in production environments, providing improved performance + and reduced latency. + +Preparation +=========== + +Configure the kernel with: + + .. code-block:: make + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + +Customization +============= + +You can enable or disable Propeller build for individual file and +directories by adding a line similar to the following to the +respective kernel Makefile: + +- For enabling a single file (e.g. foo.o) + + .. code-block:: make + + PROPELLER_PROFILE_foo.o := y + +- For enabling all files in one directory + + .. code-block:: make + + PROPELLER_PROFILE := y + +- For disabling one file + + .. code-block:: make + + PROPELLER_PROFILE_foo.o := n + +- For disabling all files in one directory + + .. code-block:: make + + PROPELLER__PROFILE := n + + +Workflow +======== + +Here is an example workflow for building an AutoFDO+Propeller kernel: + +1) Assuming an AutoFDO profile is already collected following + instructions in the AutoFDO document, build the kernel on the HOST + machine, with AutoFDO and Propeller build configs: + + .. code-block:: make + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + + and + + .. code-block:: sh + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> + +2) Install the kernel on the TEST machine. + +3) Run the load tests. The '-c' option in perf specifies the sample + event period. We suggest using a suitable prime number, like 500009, + for this purpose. + + - For Intel platforms: + + .. code-block:: sh + + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c \ + <count> -o <perf_file> -- <loadtest> + + - For AMD platforms: + + .. code-block:: sh + + $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k \ + -a -N -b -c <count> -o <perf_file> -- <loadtest> + + Note you can repeat the above steps to collect multiple <perf_file>s. + +4) (Optional) Download the raw perf file(s) to the HOST machine. + +5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to Generate Propeller profile. + + .. code-block:: sh + + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ + --format=propeller --propeller_output_module_name \ + --out=<propeller_profile_prefix>_cc_profile.txt \ + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt + + "<propeller_profile_prefix>" can be something like + "/home/user/dir/any_string". + + This command generates a pair of Propeller profiles: + "<propeller_profile_prefix>_cc_profile.txt" and + "<propeller_profile_prefix>_ld_profile.txt". + + If there are more than 1 perf_file collected in the previous step, + you can create a temp list file "<perf_file_list>" with each line + containing one perf file name and run: + + .. code-block:: sh + + $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list> \ + --format=propeller --propeller_output_module_name \ + --out=<propeller_profile_prefix>_cc_profile.txt \ + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt + +6) Rebuild the kernel using the AutoFDO and Propeller profiles. + + .. code-block:: make + + CONFIG_AUTOFDO_CLANG=y + CONFIG_PROPELLER_CLANG=y + + and + + .. code-block:: sh + + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> diff --git a/MAINTAINERS b/MAINTAINERS index 8a89e7f0d9d5..0c7f3cebe4fe 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -17974,6 +17974,13 @@ S: Maintained F: include/linux/psi* F: kernel/sched/psi.c +PROPELLER BUILD +M: Rong Xu <xur@xxxxxxxxxx> +M: Han Shen <shenhan@xxxxxxxxxx> +S: Supported +F: Documentation/dev-tools/propeller.rst +F: scripts/Makefile.propeller + PRINTK M: Petr Mladek <pmladek@xxxxxxxx> R: Steven Rostedt <rostedt@xxxxxxxxxxx> diff --git a/Makefile b/Makefile index 5ae30cc94a26..85a96d973f20 100644 --- a/Makefile +++ b/Makefile @@ -1025,6 +1025,7 @@ include-$(CONFIG_KCOV) += scripts/Makefile.kcov include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo +include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller include $(addprefix $(srctree)/, $(include-y)) diff --git a/arch/Kconfig b/arch/Kconfig index e12599c4ab63..5b136e904400 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -822,6 +822,28 @@ config AUTOFDO_CLANG If unsure, say N. +config ARCH_SUPPORTS_PROPELLER_CLANG + bool + +config PROPELLER_CLANG + bool "Enable Clang's Propeller build" + depends on ARCH_SUPPORTS_PROPELLER_CLANG + depends on AUTOFDO_CLANG + depends on CC_IS_CLANG && CLANG_VERSION >= 190000 + help + This option enables Clang’s Propeller build which + is on top of AutoFDO build. When the Propeller profiles + is specified in variable CLANG_PROPELLER_PROFILE_PREFIX + during the build process, Clang uses the profiles to + optimize the kernel. + + If no profile is specified, Proepller options are + still passed to Clang to facilitate the collection + of perf data for creating the Propeller profiles in + subsequent builds. + + If unsure, say N. + config ARCH_SUPPORTS_CFI_CLANG bool help diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index dca526b1364f..6fb5269d39b0 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -123,6 +123,7 @@ config X86 select ARCH_SUPPORTS_LTO_CLANG select ARCH_SUPPORTS_LTO_CLANG_THIN select ARCH_SUPPORTS_AUTOFDO_CLANG + select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64 select ARCH_USE_BUILTIN_BSWAP select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64 select ARCH_USE_MEMTEST diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile index f2051644de94..35d19b4e6361 100644 --- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -17,6 +17,9 @@ # (see scripts/Makefile.lib size_append) # compressed vmlinux.bin.all + u32 size of vmlinux.bin.all +# Do not run Propeller optimizer for early boot code. +PROPELLER_PROFILE := n + targets := vmlinux vmlinux.bin vmlinux.bin.gz vmlinux.bin.bz2 vmlinux.bin.lzma \ vmlinux.bin.xz vmlinux.bin.lzo vmlinux.bin.lz4 vmlinux.bin.zst diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 3509afc6a672..167dd05323cf 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -440,6 +440,10 @@ SECTIONS STABS_DEBUG DWARF_DEBUG +#ifdef CONFIG_PROPELLER_CLANG + .llvm_bb_addr_map : { *(.llvm_bb_addr_map) } +#endif + ELF_DETAILS DISCARDS diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile index 543df9a1379d..e0c846b6d636 100644 --- a/arch/x86/platform/efi/Makefile +++ b/arch/x86/platform/efi/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 KASAN_SANITIZE := n GCOV_PROFILE := n +PROPELLER_PROFILE := n obj-$(CONFIG_EFI) += memmap.o quirks.o efi.o efi_$(BITS).o \ efi_stub_$(BITS).o diff --git a/drivers/firmware/efi/libstub/Makefile b/drivers/firmware/efi/libstub/Makefile index 06f0428a723c..55ca5250df1a 100644 --- a/drivers/firmware/efi/libstub/Makefile +++ b/drivers/firmware/efi/libstub/Makefile @@ -56,6 +56,8 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_CFI), $(KBUILD_CFLAGS)) # disable LTO KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) +PROPELLER_PROFILE := n + lib-y := efi-stub-helper.o gop.o secureboot.o tpm.o \ file.o mem.o random.o randomalloc.o pci.o \ skip_spaces.o lib-cmdline.o lib-ctype.o \ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 7d9dc8a3c046..ea3d8bf51edd 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -95,14 +95,14 @@ * With LTO_CLANG, the linker also splits sections by default, so we need * these macros to combine the sections during the final link. * - * With LTO_CLANG, the linker also splits sections by default, so we need - * these macros to combine the sections during the final link. + * CONFIG_AUTOFD_CLANG and CONFIG_PROPELLER_CLANG will also split text sections + * and cluster them in the linking time. * * RODATA_MAIN is not used because existing code already defines .rodata.x * sections to be brought in with rodata. */ #if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \ -defined(CONFIG_AUTOFDO_CLANG) +defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* #else #define TEXT_MAIN .text @@ -612,7 +612,7 @@ defined(CONFIG_AUTOFDO_CLANG) * first when in these builds. */ #if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \ -defined(CONFIG_AUTOFDO_CLANG) +defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) #define TEXT_TEXT \ *(.text.asan.* .text.tsan.*) \ *(.text.unknown .text.unknown.*) \ diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib index c2cab5adaf25..e239fa709c20 100644 --- a/scripts/Makefile.lib +++ b/scripts/Makefile.lib @@ -219,6 +219,16 @@ _c_flags += $(if $(patsubst n%,, \ $(CFLAGS_AUTOFDO_CLANG)) endif +# +# Enable Clang's Propeller build flags for a file or directory depending on +# variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE. +# +ifeq ($(CONFIG_PROPELLER_CLANG),y) +_c_flags += $(if $(patsubst n%,, \ + $(AUTOFDO_PROFILE_$(basetarget).o)$(AUTOFDO_PROFILE)$(PGO_PROFILE)$(PROPELLER_PROFILE)y), \ + $(CFLAGS_PROPELLER_CLANG)) +endif + # $(src) for including checkin headers from generated source files # $(obj) for including generated headers from checkin source files ifeq ($(KBUILD_EXTMOD),) diff --git a/scripts/Makefile.propeller b/scripts/Makefile.propeller new file mode 100644 index 000000000000..0c9318be5f64 --- /dev/null +++ b/scripts/Makefile.propeller @@ -0,0 +1,25 @@ +# SPDX-License-Identifier: GPL-2.0 + +# Enable available and selected Clang Propeller features. +# Propeller required debug information to embed module names in the profiles. +CFLAGS_PROPELLER_CLANG := -fdebug-info-for-profiling + +ifdef CLANG_PROPELLER_PROFILE_PREFIX +CFLAGS_PROPELLER_CLANG += -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections +KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering +else +CFLAGS_PROPELLER_CLANG += -fbasic-block-sections=labels +endif + +ifdef CONFIG_LTO_CLANG +ifdef CONFIG_LTO_CLANG_THIN +ifdef CLANG_PROPELLER_PROFILE_PREFIX +KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt +else +KBUILD_LDFLAGS += --lto-basic-block-sections=labels +endif +endif +else +endif + +export CFLAGS_PROPELLER_CLANG diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 254913498c3c..7cea8ba53cf4 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -4489,6 +4489,7 @@ static int validate_ibt(struct objtool_file *file) !strcmp(sec->name, "__mcount_loc") || !strcmp(sec->name, ".kcfi_traps") || !strcmp(sec->name, ".llvm.call-graph-profile") || + !strcmp(sec->name, ".llvm_bb_addr_map") || strstr(sec->name, "__patchable_function_entries")) continue; -- 2.46.0.rc1.232.g9752f9e123-goog