On Fri, Nov 1, 2024 at 11:02 AM Masahiro Yamada <masahiroy@xxxxxxxxxx> wrote: > > On Thu, Oct 24, 2024 at 7:44 AM Rong Xu <xur@xxxxxxxxxx> wrote: > > > > Add the build support for using Clang's AutoFDO. Building the kernel > > with AutoFDO does not reduce the optimization level from the > > compiler. AutoFDO uses hardware sampling to gather information about > > the frequency of execution of different code paths within a binary. > > This information is then used to guide the compiler's optimization > > decisions, resulting in a more efficient binary. Experiments > > showed that the kernel can improve up to 10% in latency. > > > > The support requires a Clang compiler after LLVM 17. This submission > > is limited to x86 platforms that support PMU features like LBR on > > Intel machines and AMD Zen3 BRS. Support for SPE on ARM 1, > > and BRBE on ARM 1 is part of planned future work. > > > > Here is an example workflow for AutoFDO kernel: > > > > 1) Build the kernel on the host machine with LLVM enabled, for example, > > $ make menuconfig LLVM=1 > > Turn on AutoFDO build config: > > CONFIG_AUTOFDO_CLANG=y > > With a configuration that has LLVM enabled, use the following > > command: > > scripts/config -e AUTOFDO_CLANG > > After getting the config, build with > > $ make LLVM=1 > > > > 2) Install the kernel on the test machine. > > > > 3) Run the load tests. The '-c' option in perf specifies the sample > > event period. We suggest using a suitable prime number, > > like 500009, for this purpose. > > For Intel platforms: > > $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \ > > -o <perf_file> -- <loadtest> > > For AMD platforms: > > The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2 > > For Zen3: > > $ cat proc/cpuinfo | grep " brs" > > For Zen4: > > $ cat proc/cpuinfo | grep amd_lbr_v2 > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \ > > -N -b -c <count> -o <perf_file> -- <loadtest> > > > > 4) (Optional) Download the raw perf file to the host machine. > > > > 5) To generate an AutoFDO profile, two offline tools are available: > > create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part > > of the AutoFDO project and can be found on GitHub > > (https://github.com/google/autofdo), version v0.30.1 or later. The > > llvm_profgen tool is included in the LLVM compiler itself. It's > > important to note that the version of llvm_profgen doesn't need to > > match the version of Clang. It needs to be the LLVM 19 release or > > later, or from the LLVM trunk. > > $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> \ > > -o <profile_file> > > or > > $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \ > > --format=extbinary --out=<profile_file> > > > > Note that multiple AutoFDO profile files can be merged into one via: > > $ llvm-profdata merge -o <profile_file> <profile_1> ... <profile_n> > > > > 6) Rebuild the kernel using the AutoFDO profile file with the same config > > as step 1, (Note CONFIG_AUTOFDO_CLANG needs to be enabled): > > $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> > > > > Co-developed-by: Han Shen <shenhan@xxxxxxxxxx> > > Signed-off-by: Han Shen <shenhan@xxxxxxxxxx> > > Signed-off-by: Rong Xu <xur@xxxxxxxxxx> > > Suggested-by: Sriraman Tallam <tmsriram@xxxxxxxxxx> > > Suggested-by: Krzysztof Pszeniczny <kpszeniczny@xxxxxxxxxx> > > Suggested-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > > Suggested-by: Stephane Eranian <eranian@xxxxxxxxxx> > > Tested-by: Yonghong Song <yonghong.song@xxxxxxxxx> > > > > > > +Workflow > > +======== > > + > > +Here is an example workflow for AutoFDO kernel: > > + > > +1) Build the kernel on the host machine with LLVM enabled, > > + for example, :: > > + > > + $ make menuconfig LLVM=1 > > + > > + Turn on AutoFDO build config:: > > + > > + CONFIG_AUTOFDO_CLANG=y > > + > > + With a configuration that with LLVM enabled, use the following command:: > > + > > + $ scripts/config -e AUTOFDO_CLANG > > + > > + After getting the config, build with :: > > + > > + $ make LLVM=1 > > + > > +2) Install the kernel on the test machine. > > + > > +3) Run the load tests. The '-c' option in perf specifies the sample > > + event period. We suggest using a suitable prime number, like 500009, > > + for this purpose. > > + > > + - For Intel platforms:: > > + > > + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > + > > + - For AMD platforms:: > > I am not sure if this double-colon is needed > when the next line is not code. Thanks for catching this. We don't mean to use "::" here. It should be ":" and there is supposed to be a blank line after this. Also a blank line before "For Zen3::". I will fix this in the patch. > > > > > + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check, > > + For Zen3:: > > + > > + $ cat proc/cpuinfo | grep " brs" > > + > > + For Zen4:: > > + > > + $ cat proc/cpuinfo | grep amd_lbr_v2 > > + > > + The following command generated the perf data file:: > > + > > + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > + > > +4) (Optional) Download the raw perf file to the host machine. > > + > > +5) To generate an AutoFDO profile, two offline tools are available: > > + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part > > + of the AutoFDO project and can be found on GitHub > > + (https://github.com/google/autofdo), version v0.30.1 or later. > > + The llvm_profgen tool is included in the LLVM compiler itself. It's > > + important to note that the version of llvm_profgen doesn't need to match > > + the version of Clang. It needs to be the LLVM 19 release of Clang > > + or later, or just from the LLVM trunk. :: > > + > > + $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file> > > + > > + or :: > > + > > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file> > > + > > + Note that multiple AutoFDO profile files can be merged into one via:: > > + > > + $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n> > > + > > +6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, > > + (Note CONFIG_AUTOFDO_CLANG needs to be enabled):: > > + > > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> > > + > > Trailing blank line. > > .git/rebase-apply/patch:187: new blank line at EOF. Will remote the blank line. > > > > > > -- > Best Regards > Masahiro Yamada