On Sat, Jun 12, 2021 at 9:59 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Wed, Apr 07, 2021 at 02:17:04PM -0700, Bill Wendling wrote: > > From: Sami Tolvanen <samitolvanen@xxxxxxxxxx> > > > > Enable the use of clang's Profile-Guided Optimization[1]. To generate a > > profile, the kernel is instrumented with PGO counters, a representative > > workload is run, and the raw profile data is collected from > > /sys/kernel/debug/pgo/profraw. > > > > The raw profile data must be processed by clang's "llvm-profdata" tool > > before it can be used during recompilation: > > > > $ cp /sys/kernel/debug/pgo/profraw vmlinux.profraw > > $ llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw > > > > Multiple raw profiles may be merged during this step. > > > > The data can now be used by the compiler: > > > > $ make LLVM=1 KCFLAGS=-fprofile-use=vmlinux.profdata ... > > > > This initial submission is restricted to x86, as that's the platform we > > know works. This restriction can be lifted once other platforms have > > been verified to work with PGO. > > *sigh*, and not a single x86 person on Cc, how nice :-/ > This tool is generic and, despite the fact that it's first enabled for x86, it contains no x86-specific code. The reason we're restricting it to x86 is because it's the platform we tested on. > > Note that this method of profiling the kernel is clang-native, unlike > > the clang support in kernel/gcov. > > > > [1] https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization > > Also, and I don't see this answered *anywhere*, why are you not using > perf for this? Your link even mentions Sampling Profilers (and I happen > to know there's been significant effort to make perf output work as > input for the PGO passes of the various compilers). > Instruction-based (non-sampling) profiling gives us a better context-sensitive profile, making PGO more impactful. It's also useful for coverage whereas sampling profiles cannot. > > Signed-off-by: Sami Tolvanen <samitolvanen@xxxxxxxxxx> > > Co-developed-by: Bill Wendling <morbo@xxxxxxxxxx> > > Signed-off-by: Bill Wendling <morbo@xxxxxxxxxx> > > Tested-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > > Reviewed-by: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > > Reviewed-by: Fangrui Song <maskray@xxxxxxxxxx> > > --- > > Documentation/dev-tools/index.rst | 1 + > > Documentation/dev-tools/pgo.rst | 127 +++++++++ > > MAINTAINERS | 9 + > > Makefile | 3 + > > arch/Kconfig | 1 + > > arch/x86/Kconfig | 1 + > > arch/x86/boot/Makefile | 1 + > > arch/x86/boot/compressed/Makefile | 1 + > > arch/x86/crypto/Makefile | 4 + > > arch/x86/entry/vdso/Makefile | 1 + > > arch/x86/kernel/vmlinux.lds.S | 2 + > > arch/x86/platform/efi/Makefile | 1 + > > arch/x86/purgatory/Makefile | 1 + > > arch/x86/realmode/rm/Makefile | 1 + > > arch/x86/um/vdso/Makefile | 1 + > > drivers/firmware/efi/libstub/Makefile | 1 + > > include/asm-generic/vmlinux.lds.h | 34 +++ > > kernel/Makefile | 1 + > > kernel/pgo/Kconfig | 35 +++ > > kernel/pgo/Makefile | 5 + > > kernel/pgo/fs.c | 389 ++++++++++++++++++++++++++ > > kernel/pgo/instrument.c | 189 +++++++++++++ > > kernel/pgo/pgo.h | 203 ++++++++++++++ > > scripts/Makefile.lib | 10 + > > 24 files changed, 1022 insertions(+) > > create mode 100644 Documentation/dev-tools/pgo.rst > > create mode 100644 kernel/pgo/Kconfig > > create mode 100644 kernel/pgo/Makefile > > create mode 100644 kernel/pgo/fs.c > > create mode 100644 kernel/pgo/instrument.c > > create mode 100644 kernel/pgo/pgo.h > > > --- a/Makefile > > +++ b/Makefile > > @@ -660,6 +660,9 @@ endif # KBUILD_EXTMOD > > # Defaults to vmlinux, but the arch makefile usually adds further targets > > all: vmlinux > > > > +CFLAGS_PGO_CLANG := -fprofile-generate > > +export CFLAGS_PGO_CLANG > > + > > CFLAGS_GCOV := -fprofile-arcs -ftest-coverage \ > > $(call cc-option,-fno-tree-loop-im) \ > > $(call cc-disable-warning,maybe-uninitialized,) > > And which of the many flags in noinstr disables this? > These flags aren't used with PGO. So there's no need to disable them. > Basically I would like to NAK this whole thing until someone can > adequately explain the interaction with noinstr and why we need those > many lines of kernel code and can't simply use perf for this. -bw