Re: [PATCH v4] pgo: add clang's Profile Guided Optimization infrastructure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Wed, Jan 13, 2021 at 8:07 PM Nick Desaulniers
> <ndesaulniers@xxxxxxxxxx> wrote:
> >
> > On Wed, Jan 13, 2021 at 12:55 PM Nathan Chancellor
> > <natechancellor@xxxxxxxxx> wrote:
> > >
> > > However, I see an issue with actually using the data:
> > >
> > > $ sudo -s
> > > # mount -t debugfs none /sys/kernel/debug
> > > # cp -a /sys/kernel/debug/pgo/profraw vmlinux.profraw
> > > # chown nathan:nathan vmlinux.profraw
> > > # exit
> > > $ tc-build/build/llvm/stage1/bin/llvm-profdata merge --output=vmlinux.profdata vmlinux.profraw
> > > warning: vmlinux.profraw: Invalid instrumentation profile data (bad magic)
> > > error: No profiles could be merged.
> > >
> > > Am I holding it wrong? :) Note, this is virtualized, I do not have any
> > > "real" x86 hardware that I can afford to test on right now.
> >
> > Same.
> >
> > I think the magic calculation in this patch may differ from upstream
> > llvm: https://github.com/llvm/llvm-project/blob/49142991a685bd427d7e877c29c77371dfb7634c/llvm/include/llvm/ProfileData/SampleProf.h#L96-L101
> 
> Err...it looks like it was the padding calculation.  With that fixed
> up, we can query the profile data to get insights on the most heavily
> called functions.  Here's what my top 20 are (reset, then watch 10
> minutes worth of cat videos on youtube while running `find /` and
> sleeping at my desk).  Anything curious stand out to anyone?

Hello world from my personal laptop whose kernel was rebuilt with
profiling data!  Wow, I can run `find /` and watch cat videos on youtube
so fast!  Users will love this! /s

Checking the sections sizes of .text.hot. and .text.unlikely. looks
good!

> 
> $ llvm-profdata show -topn=20 /tmp/vmlinux.profraw
> Instrumentation level: IR  entry_first = 0
> Total functions: 48970
> Maximum function count: 62070879
> Maximum internal block count: 83221158
> Top 20 functions with the largest internal block counts:
>   drivers/tty/n_tty.c:n_tty_write, max count = 83221158
>   rcu_read_unlock_strict, max count = 62070879
>   _cond_resched, max count = 25486882
>   rcu_all_qs, max count = 25451477
>   drivers/cpuidle/poll_state.c:poll_idle, max count = 23618576
>   _raw_spin_unlock_irqrestore, max count = 18874121
>   drivers/cpuidle/governors/menu.c:menu_select, max count = 18721624
>   _raw_spin_lock_irqsave, max count = 18509161
>   memchr, max count = 15525452
>   _raw_spin_lock, max count = 15484254
>   __mod_memcg_state, max count = 14604619
>   __mod_memcg_lruvec_state, max count = 14602783
>   fs/ext4/hash.c:str2hashbuf_signed, max count = 14098424
>   __mod_lruvec_state, max count = 12527154
>   __mod_node_page_state, max count = 12525172
>   native_sched_clock, max count = 8904692
>   sched_clock_cpu, max count = 8895832
>   sched_clock, max count = 8894627
>   kernel/entry/common.c:exit_to_user_mode_prepare, max count = 8289031
>   fpregs_assert_state_consistent, max count = 8287198
> 
> -- 
> Thanks,
> ~Nick Desaulniers
> 



[Index of Archives]     [Linux&nblp;USB Development]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite Secrets]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux