On Fri, May 27, 2022 at 07:09:59PM +0800, Zhang Yuchen wrote: > Add /proc/syscalls to display percpu syscall count. > > We need a less resource-intensive way to count syscall per cpu > for system problem location. Why? How is this less resource intensive than perf? > There is a similar utility syscount in the BCC project, but syscount > has a high performance cost. What is that cost? > The following is a comparison on the same machine, using UnixBench > System Call Overhead: > > ┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━┓ > ┃ Change ┃ Unixbench Score ┃ Loss ┃ > ┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━┩ > │ no change │ 1072.6 │ --- │ > │ syscall count │ 982.5 │ 8.40% │ > │ bpf syscount │ 614.2 │ 42.74% │ > └───────────────┴─────────────────┴────────┘ Again, what about perf? > UnixBench System Call Use sys_gettid to test, this system call only reads > one variable, so the performance penalty seems large. When tested with > fork, the test scores were almost the same. > > So the conclusion is that it does not have a significant impact on system > call performance. 8% is huge for a system-wide decrease in performance. Who would ever use this? > This function depends on CONFIG_FTRACE_SYSCALLS because the system call > number is stored in syscall_metadata. > > Signed-off-by: Zhang Yuchen <zhangyuchen.lcr@xxxxxxxxxxxxx> > --- > Documentation/filesystems/proc.rst | 28 +++++++++ > arch/arm64/include/asm/syscall_wrapper.h | 2 +- > arch/s390/include/asm/syscall_wrapper.h | 4 +- > arch/x86/include/asm/syscall_wrapper.h | 2 +- > fs/proc/Kconfig | 7 +++ > fs/proc/Makefile | 1 + > fs/proc/syscall.c | 79 ++++++++++++++++++++++++ > include/linux/syscalls.h | 51 +++++++++++++-- > 8 files changed, 165 insertions(+), 9 deletions(-) > create mode 100644 fs/proc/syscall.c > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst > index 1bc91fb8c321..80394a98a192 100644 > --- a/Documentation/filesystems/proc.rst > +++ b/Documentation/filesystems/proc.rst > @@ -686,6 +686,7 @@ files are there, and which are missing. > fs File system parameters, currently nfs/exports (2.4) > ide Directory containing info about the IDE subsystem > interrupts Interrupt usage > + syscalls Syscall count for each cpu > iomem Memory map (2.4) > ioports I/O port usage > irq Masks for irq to cpu affinity (2.4)(smp?) > @@ -1225,6 +1226,33 @@ Provides counts of softirq handlers serviced since boot time, for each CPU. > HRTIMER: 0 0 0 0 > RCU: 1678 1769 2178 2250 > > +syscalls > +~~~~~~~~ > + > +Provides counts of syscall since boot time, for each cpu. > + > +:: > + > + > cat /proc/syscalls > + CPU0 CPU1 CPU2 CPU3 > + 0: 3743 3099 3770 3242 sys_read > + 1: 222 559 822 522 sys_write > + 2: 0 0 0 0 sys_open > + 3: 6481 18754 12077 7349 sys_close > + 4: 11362 11120 11343 10665 sys_newstat > + 5: 5224 13880 8578 5971 sys_newfstat > + 6: 1228 1269 1459 1508 sys_newlstat > + 7: 90 43 64 67 sys_poll > + 8: 1635 1000 2071 1161 sys_lseek > + .... omit the middle line .... > + 441: 0 0 0 0 sys_epoll_pwait2 > + 442: 0 0 0 0 sys_mount_setattr > + 443: 0 0 0 0 sys_quotactl_fd > + 447: 0 0 0 0 sys_memfd_secret > + 448: 0 0 0 0 sys_process_mrelease > + 449: 0 0 0 0 sys_futex_waitv > + 450: 0 0 0 0 sys_set_mempolicy_home_node So for systems with large numbers of CPUs, these are huge lines? Have you tested this on large systems? If so, how big? thanks, greg k-h