On Thu, Feb 01, 2018 at 09:15:15PM +0100, Radim Krčmář wrote: > 2018-02-01 12:54-0500, Luiz Capitulino: > > > > Libvirt needs to know when a vCPU is halted. To get this information, > > I don't see why upper level management should care about that, a single > bit about halted state that can be incorrect at the time it is processed > seems of very limited use. I don't see why, either. I'm CCing libvir-list and the people involved in the code that added halt state to libvirt domain statistics. > > (A much more sensible data point would be the fraction of time when VCPU > was running or runnable, which is roughly what you get by sampling the > halted state.) > > A halted vCPU it might even be halted in guest mode, so KVM doesn't know > about that state (unless you force a VM exit), which would complicate > the collection a bit more ... but really, what is the data being used > for? > > User might care about the state, for obscure reasons, but that isn't a > performance problem. > > > libvirt has started using the query-cpus command from QEMU. However, > > if in kernel irqchip is in use, query-cpus will force all vCPUs > > to user-space since they have to issue the KVM_GET_MP_STATE ioctl. > > Libvirt knows if KVM exits to userspace on halts, so it can just query > QEMU in that case and in the other case, there is a very dirty > "solution" that works on all architectures right now: > > grep kvm_vcpu_block /proc/$vcpu_task/stack > > If you get something, the vcpu is halted in KVM. Nice. > > > This has catastrophic implications to low-latency workloads like > > KVM-RT and zero packet loss with DPDK. To make matters worse, there's > > an OpenStack service called ceilometer that causes libvirt to > > issue query-cpus every few minutes. > > I'd expect that people running these workloads can setup the system. :( > > I bet that ceilometer just mindlessly collects everything, so we should > be able to configure libvirt to collect only some stats. Either libvirt > or upper layer would decide what is too expensive for its usefulness. Yes. Including expensive-to-collect halt state in VIR_DOMAIN_STATS_VCPU is a serious performance regression in libvirt. > > > The solution proposed in this patch is to export the vCPU > > halted state in the already existing vcpu directory in sysfs. > > This way, libvirt can read the vCPU halted state from sysfs and avoid > > using the query-cpus command. This solution seems to be sufficient > > for libvirt needs, but it has the following cons: > > > > * vcpu information in sysfs lives in a debug directory, so > > libvirt would be basing its API on debug info > > (It pains me to say there probably already are tools that depend on > kvm/debug.) > > It's slightly better than the stack hack, but needs more code in kernel > and the interface is in a gray compatibility zone, so I'd like to know > why does userspace do that in the first place. > > > * Currently, only x86 supports the vcpu dir in sysfs, so > > we'd have to expand this to other archs (should be doable) > > > > If we agree that this solution is feasible, I'll work on extending > > the vcpu debug information to other archs for my next posting. > > > > Signed-off-by: Luiz Capitulino <lcapitulino@xxxxxxxxxx> > > --- > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > @@ -6273,6 +6273,7 @@ void kvm_arch_exit(void) > > > > int kvm_vcpu_halt(struct kvm_vcpu *vcpu) > > { > > + kvm_vcpu_set_halted(vcpu); > > There is no point to care about !lapic_in_kernel(). I'd move the logic > into vcpu_block() to be shared among all architectures. > > > ++vcpu->stat.halt_exits; > > if (lapic_in_kernel(vcpu)) { > > vcpu->arch.mp_state = KVM_MP_STATE_HALTED; -- Eduardo