On Wed, Oct 02, 2024 at 04:34:00PM -0700, Rong Xu wrote: > +Preparation > +=========== > + > +Configure the kernel with: > + > + .. code-block:: make > + > + CONFIG_AUTOFDO_CLANG=y > + > +Customization > +============= > + > +You can enable or disable AutoFDO build for individual file and directories by > +adding a line similar to the following to the respective kernel Makefile: > + > +- For enabling a single file (e.g. foo.o) > + > + .. code-block:: make > + > + AUTOFDO_PROFILE_foo.o := y > + > +- For enabling all files in one directory > + > + .. code-block:: make > + > + AUTOFDO_PROFILE := y > + > +- For disabling one file > + > + .. code-block:: make > + > + AUTOFDO_PROFILE_foo.o := n > + > +- For disabling all files in one directory > + > + .. code-block:: make > + > + AUTOFDO_PROFILE := n > + > + > +Workflow > +======== > + > +Here is an example workflow for AutoFDO kernel: > + > + > + > +1) Build the kernel on the HOST machine with LLVM enabled, for example, > + > + .. code-block:: make > + > + $ make menuconfig LLVM=1 > + > + > + Turn on AutoFDO build config: > + > + .. code-block:: make > + > + CONFIG_AUTOFDO_CLANG=y > + > + With a configuration that with LLVM enabled, use the following command: > + > + .. code-block:: sh > + > + $ scripts/config -e AUTOFDO_CLANG > + > + After getting the config, build with > + > + .. code-block:: make > + > + $ make LLVM=1 > + > +2) Install the kernel on the TEST machine. > + > +3) Run the load tests. The '-c' option in perf specifies the sample > + event period. We suggest using a suitable prime number, like 500009, > + for this purpose. > + > + - For Intel platforms: > + > + .. code-block:: sh > + > + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > + > + - For AMD platforms: For Intel platforms: > + The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check, > + For Zen3: > + > + .. code-block:: sh > + > + $ cat proc/cpuinfo | grep " brs" > + > + For Zen4: > + > + .. code-block:: sh > + > + $ cat proc/cpuinfo | grep amd_lbr_v2 > + > + The following command generated the perf data file: > + > + .. code-block:: sh > + > + $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b \ > + -c <count> -o <perf_file> -- <loadtest> > + > +4) (Optional) Download the raw perf file to the HOST machine. > + > +5) To generate an AutoFDO profile, two offline tools are available: > + create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part > + of the AutoFDO project and can be found on GitHub > + (https://github.com/google/autofdo), version v0.30.1 or later. > + The llvm_profgen tool is included in the LLVM compiler itself. It's > + important to note that the version of llvm_profgen doesn't need to match > + the version of Clang. It needs to be the LLVM 19 release of Clang > + or later, or just from the LLVM trunk. > + > + .. code-block:: sh > + > + $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file> > + > + or > + .. code-block:: sh > + > + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary -o <profile_file> > + > + Note that multiple AutoFDO profile files can be merged into one via: > + > + .. code-block:: sh > + > + $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n> > + > + > +6) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, > + (Note CONFIG_AUTOFDO_CLANG needs to be enabled): > + > + .. code-block:: sh > + > + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file > + Can this be done without the endless ... code-block nonsense?