Re: [RFC PATCH 0/2] Add vcpu debugfs to record statstical data for every single

"chenxiang (M)" <chenxiang66@xxxxxxxxxxxxx> · Wed, 7 Dec 2022 18:16:17 +0800

在 2022/12/7 16:21, Marc Zyngier 写道:
On Tue, 06 Dec 2022 12:58:26 +0000,
chenxiang <chenxiang66@xxxxxxxxxxxxx> wrote:
From: Xiang Chen <chenxiang66@xxxxxxxxxxxxx>

Currently it only records statistical data for all vcpus, but we ofen want
to know statistical data for a single vcpu, there is no debugfs for that.
So add vcpu debugfs to record statstical data for every single vcpu, and
also enable vcpu debugfs for arm64.

After the change, those vcpu debugfs are as follows (we have 4 vcpu in the
vm):

[root@centos kvm]# cd 2025-14/
[root@centos 2025-14]# ls
blocking                halt_wait_hist             vcpu0
exits                   halt_wait_ns               vcpu1
halt_attempted_poll     halt_wakeup                vcpu2
halt_poll_fail_hist     hvc_exit_stat              vcpu3
halt_poll_fail_ns       mmio_exit_kernel           vgic-state
halt_poll_invalid       mmio_exit_user             wfe_exit_stat
halt_poll_success_hist  remote_tlb_flush           wfi_exit_stat
halt_poll_success_ns    remote_tlb_flush_requests
halt_successful_poll    signal_exits
[root@centos 2025-14]# cat exits
124689
[root@centos 2025-14]# cat vcpu0/exits
52966
[root@centos 2025-14]# cat vcpu1/exits
21549
[root@centos 2025-14]# cat vcpu2/exits
43864
[root@centos 2025-14]# cat vcpu3/exits
6572
[root@centos 2025-14]# ls vcpu0
blocking             halt_poll_invalid       halt_wait_ns      pid
exits                halt_poll_success_hist  halt_wakeup       signal_exits
halt_attempted_poll  halt_poll_success_ns    hvc_exit_stat     wfe_exit_stat
halt_poll_fail_hist  halt_successful_poll    mmio_exit_kernel  wfi_exit_stat
halt_poll_fail_ns    halt_wait_hist          mmio_exit_user
This is yet another example of "KVM doesn't give me the stats I want,
so let's pile more stats on top". This affects every users (counters
are not free), and hardly benefits anyone.

Currently it already has vcpu debugfs on top, but it only records 
statstical data for total vm
which is helpless for debug, for example, file exists records the number 
of VM exist for all vcpus, before we encountered a
issue that there is something wrong with the thread of a vcpu which 
doesn't VM exit but other vcpus are normal,
we can't get anything useful from current vcpu debugfs as the number of 
exits still increase in current vcpu debugfs.
Compared with current vcpu debugfs, i think it is more useful to know 
the statstical data for every vcpu and it benefits more.

How about you instead add trace hooks that allows you to plumb your
own counters using BPF or another kernel module? This is what is stuff
is for, and we really don't need to create more ABI around that. At
least, the other stat-hungry folks out there would also be able to get
their own stuff, and normal users wouldn't be affected by it.

Thanks,

	M.