Select ARCH_SUPPORTS_AUTOFDO_CLANG to allow AUTOFDO_CLANG to be selected. On ARM64, ETM traces can be recorded and converted to AutoFDO profiles. Experiments on Android show 4% improvement in cold app startup time and 13% improvement in binder benchmarks. Signed-off-by: Yabin Cui <yabinc@xxxxxxxxxx> --- Documentation/dev-tools/autofdo.rst | 18 +++++++++++++++++- arch/arm64/Kconfig | 1 + 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/Documentation/dev-tools/autofdo.rst b/Documentation/dev-tools/autofdo.rst index 1f0a451e9ccd..f0952e3e8490 100644 --- a/Documentation/dev-tools/autofdo.rst +++ b/Documentation/dev-tools/autofdo.rst @@ -55,7 +55,7 @@ process consists of the following steps: workload to gather execution frequency data. This data is collected using hardware sampling, via perf. AutoFDO is most effective on platforms supporting advanced PMU features like - LBR on Intel machines. + LBR on Intel machines, ETM traces on ARM machines. #. AutoFDO profile generation: Perf output file is converted to the AutoFDO profile via offline tools. @@ -141,6 +141,22 @@ Here is an example workflow for AutoFDO kernel: $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> + - For ARM platforms: + + Follow the instructions in the `Linaro OpenCSD document + https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md`_ + to record ETM traces for AutoFDO:: + + $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> + $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il + + For ARM platforms running Android, follow the instructions in the + `Android simpleperf document + <https://android.googlesource.com/platform/system/extras/+/main/simpleperf/doc/collect_etm_data_for_autofdo.md>`_ + to record ETM traces for AutoFDO:: + + $ simpleperf record -e cs-etm:k -a -o <perf_file> -- <loadtest> + 4) (Optional) Download the raw perf file to the host machine. 5) To generate an AutoFDO profile, two offline tools are available: diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fd9df6dcc593..c3814df5e391 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -103,6 +103,7 @@ config ARM64 select ARCH_SUPPORTS_PER_VMA_LOCK select ARCH_SUPPORTS_HUGE_PFNMAP if TRANSPARENT_HUGEPAGE select ARCH_SUPPORTS_RT + select ARCH_SUPPORTS_AUTOFDO_CLANG select ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH select ARCH_WANT_COMPAT_IPC_PARSE_VERSION if COMPAT select ARCH_WANT_DEFAULT_BPF_JIT -- 2.47.0.338.g60cca15819-goog