Hi Ingo, On Wed, Oct 11, 2023 at 11:03 PM Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > * Namhyung Kim <namhyung@xxxxxxxxxx> wrote: > > > * How to use it > > > > To get precise memory access samples, users can use `perf mem record` > > command to utilize those events supported by their architecture. Intel > > machines would work best as they have dedicated memory access events but > > they would have a filter to ignore low latency loads like less than 30 > > cycles (use --ldlat option to change the default value). > > > > # To get memory access samples in kernel for 1 second (on Intel) > > $ sudo perf mem record -a -K --ldlat=4 -- sleep 1 > > > > # Similar for the AMD (but it requires 6.3+ kernel for BPF filters) > > $ sudo perf mem record -a --filter 'mem_op == load, ip > 0x8000000000000000' -- sleep 1 > > BTW., it would be nice for 'perf mem record' to just do the right thing on > whatever machine it is running on. > > Also, why are BPF filters required - due to the IP filtering of mem-load > events? Right, because AMD uses IBS for precise events and it doesn't have a filtering feature. > > Could we perhaps add an IP filter to perf events to get this built-in? > Perhaps attr->exclude_user would achieve something similar? Unfortunately IBS doesn't support privilege filters IIUC. Maybe we could add a general filtering logic in the NMI handler but I'm afraid it can complicate the code and maybe slow it down a bit. Probably it's ok to have only a simple privilege filter by IP range. > > > In perf report, it's just a matter of selecting new sort keys: 'type' > > and 'typeoff'. The 'type' shows name of the data type as a whole while > > 'typeoff' shows name of the field in the data type. I found it useful > > to use it with --hierarchy option to group relevant entries in the same > > level. > > > > $ sudo perf report -s type,typeoff --hierarchy --stdio > > ... > > # > > # Overhead Data Type / Data Type Offset > > # ........... ............................ > > # > > 23.95% (stack operation) > > 23.95% (stack operation) +0 (no field) > > 23.43% (unknown) > > 23.43% (unknown) +0 (no field) > > 10.30% struct pcpu_hot > > 4.80% struct pcpu_hot +0 (current_task) > > 3.53% struct pcpu_hot +8 (preempt_count) > > 1.88% struct pcpu_hot +12 (cpu_number) > > 0.07% struct pcpu_hot +24 (top_of_stack) > > 0.01% struct pcpu_hot +40 (softirq_pending) > > 4.25% struct task_struct > > 1.48% struct task_struct +2036 (rcu_read_lock_nesting) > > 0.53% struct task_struct +2040 (rcu_read_unlock_special.b.blocked) > > 0.49% struct task_struct +2936 (cred) > > 0.35% struct task_struct +3144 (audit_context) > > 0.19% struct task_struct +46 (flags) > > 0.17% struct task_struct +972 (policy) > > 0.15% struct task_struct +32 (stack) > > 0.15% struct task_struct +8 (thread_info.syscall_work) > > 0.10% struct task_struct +976 (nr_cpus_allowed) > > 0.09% struct task_struct +2272 (mm) > > ... > > This looks really useful! :) Thanks, Namhyung