On Tue, Aug 11, 2020 at 12:58:47AM +1000, Eugene Lubarsky wrote: > This is an idea for substantially reducing the number of syscalls needed > by monitoring tools whilst mostly re-using the existing API. > > The proposed files in this proof-of-concept patch set are: > > * /proc/all/stat > A stat line for each process in the existing format. > > * /proc/all/statm > statm lines but starting with a PID column. > > * /proc/all/status > status info for all processes in the existing format. > > * /proc/all/io > The existing /proc/pid/io data but formatted as a single line for > each process, similarly to stat/statm, with a PID column added. > > * /proc/all/statx > Gathers info from stat, statm and io; the purpose is actually > not so much to reduce syscalls but to help userspace be more > efficient by not having to store data in e.g. hashtables in order > to gather it from separate /proc/all/ files. > > The format proposed here starts with the unchanged stat line > and begins the other info with a few characters, repeating for > each process: > > ... > 25 (cat) R 1 1 0 0 -1 4194304 185 0 16 0 2 0 0 0 20 ... > m 662 188 167 5 0 112 0 > io 4292 0 12 0 0 0 0 > ... > > > There has been a proposal with some overlapping goals: /proc/task-diag > (https://github.com/avagin/linux-task-diag), but I'm not sure about > its current status. I rebased the task_diag patches on top of v5.8: https://github.com/avagin/linux-task-diag/tree/v5.8-task-diag /proc/pid files have three major limitations: * Requires at least three syscalls per process per file open(), read(), close() * Variety of formats, mostly text based The kernel spent time to encode binary data into a text format and then tools like top and ps spent time to decode them back to a binary format. * Sometimes slow due to extra attributes For example, /proc/PID/smaps contains a lot of useful informations about memory mappings and memory consumption for each of them. But even if we don't need memory consumption fields, the kernel will spend time to collect this information. More details and numbers are in this article: https://avagin.github.io/how-fast-is-procfs This new interface doesn't have only one of these limitations, but task_diag doesn't have all of them. And I compared how fast each of these interfaces: The test environment: CPU: Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz RAM: 16GB kernel: v5.8 with task_diag and /proc/all patches. 100K processes: $ ps ax | wc -l 10228 $ time cat /proc/all/status > /dev/null real 0m0.577s user 0m0.017s sys 0m0.559s task_proc_all is used to read /proc/pid/status for all tasks: https://github.com/avagin/linux-task-diag/blob/master/tools/testing/selftests/task_diag/task_proc_all.c $ time ./task_proc_all status tasks: 100230 real 0m0.924s user 0m0.054s sys 0m0.858s /proc/all/status is about 40% faster than /proc/*/status. Now let's take a look at the perf output: $ time perf record -g cat /proc/all/status > /dev/null $ perf report - 98.08% 1.38% cat [kernel.vmlinux] [k] entry_SYSCALL_64 - 96.70% entry_SYSCALL_64 - do_syscall_64 - 94.97% ksys_read - 94.80% vfs_read - 94.58% proc_reg_read - seq_read - 87.95% proc_pid_status + 13.10% seq_put_decimal_ull_width - 11.69% task_mem + 9.48% seq_put_decimal_ull_width + 10.63% seq_printf - 10.35% cpuset_task_status_allowed + seq_printf - 9.84% render_sigset_t 1.61% seq_putc + 1.61% seq_puts + 4.99% proc_task_name + 4.11% seq_puts - 3.76% render_cap_t 2.38% seq_put_hex_ll + 1.25% seq_puts 2.64% __task_pid_nr_ns + 1.54% get_task_mm + 1.34% __lock_task_sighand + 0.70% from_kuid_munged 0.61% get_task_cred 0.56% seq_putc 0.52% hugetlb_report_usage 0.52% from_kgid_munged + 4.30% proc_all_next + 0.82% _copy_to_user We can see that the kernel spent more than 50% of the time to encode binary data into a text format. Now let's see how fast task_diag: $ time ./task_diag_all all -c -q real 0m0.087s user 0m0.001s sys 0m0.082s Maybe we need resurrect the task_diag series instead of inventing another less-effective interface... Thanks, Andrei > > > > Best Wishes, > > Eugene > > > Eugene Lubarsky (5): > fs/proc: Introduce /proc/all/stat > fs/proc: Introduce /proc/all/statm > fs/proc: Introduce /proc/all/status > fs/proc: Introduce /proc/all/io > fs/proc: Introduce /proc/all/statx > > fs/proc/base.c | 215 +++++++++++++++++++++++++++++++++++++++++++-- > fs/proc/internal.h | 1 + > fs/proc/root.c | 1 + > 3 files changed, 210 insertions(+), 7 deletions(-) > > -- > 2.25.1 >