Enabling an AutoFDO build requires users to explicitly set CONFIG_AUTOFDO_CLANG. The support code is in Commit 315ad8780a129e82 (kbuild: Add AutoFDO support for Clang build). The CONFIG_AUTOFDO_CLANG config, even if selected by the user, will not be enabled unless ARCH_SUPPORTS_AUTOFDO_CLANG is present. We are not enabling this for all architectures because AutoFDO's optimized build relies on Last Branch Records (LBR) which aren't available on all architectures. -Rong On Mon, Dec 9, 2024 at 8:20 AM Will Deacon <will@xxxxxxxxxx> wrote: > > On Mon, Nov 18, 2024 at 02:25:40PM -0800, Yabin Cui wrote: > > Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be > > selected. > > > > On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. > > Experiments on Android show 4% improvement in cold app startup time > > and 13% improvement in binder benchmarks. > > > > Signed-off-by: Yabin Cui <yabinc@xxxxxxxxxx> > > --- > > > > Change-Logs in V2: > > > > 1. Use "For ARM platforms with ETM trace" in autofdo.rst. > > 2. Create an issue and a change to use extbinary format in instructions: > > https://github.com/Linaro/OpenCSD/issues/65 > > https://android-review.googlesource.com/c/platform/system/extras/+/3362107 > > > > Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- > > arch/arm64/Kconfig | 1 + > > 2 files changed, 18 insertions(+), 1 deletion(-) > > > > diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst > > index 1f0a451e9ccd..a890e84a2fdd 100644 > > --- a/Documentation/dev-tools/autofdo.rst > > +++ b/Documentation/dev-tools/autofdo.rst > > @@ -55,7 +55,7 @@ process consists of the following steps: > > workload to gather execution frequency data. This data is > > collected using hardware sampling, via perf. AutoFDO is most > > effective on platforms supporting advanced PMU features like > > - LBR on Intel machines. > > + LBR on Intel machines, ETM traces on ARM machines. > > > > #. AutoFDO profile generation: Perf output file is converted to > > the AutoFDO profile via offline tools. > > @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: > > > > $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> > > > > + - For ARM platforms with ETM trace: > > + > > + Follow the instructions in the `Linaro OpenCSD document > > + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ > > + to record ETM traces for AutoFDO:: > > + > > + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> > > + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il > > + > > + For ARM platforms running Android, follow the instructions in the > > + `Android simpleperf document > > + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ > > + to record ETM traces for AutoFDO:: > > + > > + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> > > + > > 4) (Optional) Download the raw perf file to the host machine. > > > > 5) To generate an AutoFDO profile, two offline tools are available: > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index fd9df6dcc593..c3814df5e391 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -103,6 +103,7 @@ config ARM64 > > select ARCH_SUPPORTS_PER_VMA_LOCK > > select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE > > select ARCH_SUPPORTS_RT > > + select ARCH_SUPPORTS_AUTOFDO_CLANG > > select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH > > select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT > > select ARCH_WANT_DEFAULT_BPF_JIT > > After this change, both arm64 and x86 select this option unconditionally > and with no apparent support code being added. So what is actually > required in order to select ARCH_SUPPORTS_AUTOFDO_CLANG and why isn't > it just available for all architectures instead? > > Will