Hi Paolo, On Wed, Mar 10, 2021 at 8:55 AM Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > On 10/03/21 01:30, Jing Zhang wrote: > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index 383df23514b9..87dd62516c8b 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -3464,6 +3464,51 @@ static long kvm_vcpu_ioctl(struct file *filp, > > r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu); > > break; > > } > > + case KVM_STATS_GET_INFO: { > > + struct kvm_stats_info stats_info; > > + > > + r = -EFAULT; > > + stats_info.num_stats = VCPU_STAT_COUNT; > > + if (copy_to_user(argp, &stats_info, sizeof(stats_info))) > > + goto out; > > + r = 0; > > + break; > > + } > > + case KVM_STATS_GET_NAMES: { > > + struct kvm_stats_names stats_names; > > + > > + r = -EFAULT; > > + if (copy_from_user(&stats_names, argp, sizeof(stats_names))) > > + goto out; > > + r = -EINVAL; > > + if (stats_names.size < VCPU_STAT_COUNT * KVM_STATS_NAME_LEN) > > + goto out; > > + > > + r = -EFAULT; > > + if (copy_to_user(argp + sizeof(stats_names), > > + kvm_vcpu_stat_strings, > > + VCPU_STAT_COUNT * KVM_STATS_NAME_LEN)) > > The only reason to separate the strings in patch 1 is to pass them here. > But this is a poor API because it imposes a limit on the length of the > statistics, and makes that length part of the binary interface. Agreed. I am considering returning the length of stats name strings in the kvm_stats_info structure instead of exporting it as a constant in uapi, which would put no limit on the length of the stats name strings. > > I would prefer a completely different interface, where you have a file > descriptor that can be created and associated to a vCPU or VM (or even > to /dev/kvm). Having a file descriptor is important because the fd can > be passed to a less-privileged process that takes care of gathering the > metrics Separate file descriptor solution is very tempting. We are still considering it seriously. Our biggest concern is that the metrics gathering/handling process is not necessary running on the same node as the one file descriptor belongs to. It scales better to pass metrics data directly than to pass file descriptors. > > The result of reading the file descriptor could be either ASCII or > binary. IMO the real cost lies in opening and reading a multitude of > files rather than in the ASCII<->binary conversion. Agreed. > > The format could be one of the following: > > * binary: > > 4 bytes flags (always zero) > 4 bytes number of statistics > 4 bytes offset of the first stat description > 4 bytes offset of the first stat value > stat descriptions: > - 4 bytes for the type (for now always zero: uint64_t) > - 4 bytes for the flags (for now always zero) > - length of name > - name > statistics in 64-bit format > > * text: > > stat1_name uint64 123 > stat2_name uint64 456 > ... > > What do you think? The binary format presented above is very flexible. I understand why it is organized this way. In our situation, the metrics data could be pulled periodically as short as half second. They are used by different kinds of monitors/triggers/alerts. To enhance efficiency and reduce traffic caused by metrics passing, we treat all metrics info/data as two kinds. One is immutable information, which doesn't change in a given system boot. The other is mutable data (statistics data), which is pulled/transferred periodically at a high frequency. > > Paolo > Thanks, Jing