On 4/15/19 11:25 AM, Paul Mackerras wrote: > On Mon, Apr 15, 2019 at 01:32:19PM +1000, David Gibson wrote: >> On Tue, Apr 09, 2019 at 04:13:47PM +0200, Cédric Le Goater wrote: >>> When the VM boots, the CAS negotiation process determines which >>> interrupt mode to use and invokes a machine reset. At that time, any >>> links to the previous KVM interrupt device should be 'destroyed' >>> before the new chosen one is created. >>> >>> To perform the necessary cleanups in KVM, we extend the KVM device >>> interface with a new 'release' operation which is called when the file >>> descriptor of the device is closed. >>> >>> Such operations are defined for the XICS-on-XIVE and the XIVE native >>> KVM devices. They clear the vCPU interrupt presenters that could be >>> attached and then destroy the device. >>> >>> Signed-off-by: Cédric Le Goater <clg@xxxxxxxx> >>> --- >>> include/linux/kvm_host.h | 1 + >>> arch/powerpc/kvm/book3s_xive.c | 50 +++++++++++++++++++++++++-- >>> arch/powerpc/kvm/book3s_xive_native.c | 23 ++++++++++++ >>> virt/kvm/kvm_main.c | 13 +++++++ >>> 4 files changed, 85 insertions(+), 2 deletions(-) >>> >>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h >>> index 831d963451d8..3b444620d8fc 100644 >>> --- a/include/linux/kvm_host.h >>> +++ b/include/linux/kvm_host.h >>> @@ -1246,6 +1246,7 @@ struct kvm_device_ops { >>> long (*ioctl)(struct kvm_device *dev, unsigned int ioctl, >>> unsigned long arg); >>> int (*mmap)(struct kvm_device *dev, struct vm_area_struct *vma); >>> + void (*release)(struct kvm_device *dev); >>> }; >>> >>> void kvm_device_get(struct kvm_device *dev); >>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c >>> index 4d4e1730de84..ba777db849d7 100644 >>> --- a/arch/powerpc/kvm/book3s_xive.c >>> +++ b/arch/powerpc/kvm/book3s_xive.c >>> @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu) >>> void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) >>> { >>> struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; >>> - struct kvmppc_xive *xive = xc->xive; >>> + struct kvmppc_xive *xive; >>> int i; >>> >>> + if (!kvmppc_xics_enabled(vcpu)) >>> + return; >>> + >>> + if (!xc) >>> + return; >>> + >>> pr_devel("cleanup_vcpu(cpu=%d)\n", xc->server_num); >>> >>> + xive = xc->xive; >>> + >>> /* Ensure no interrupt is still routed to that VP */ >>> xc->valid = false; >>> kvmppc_xive_disable_vcpu_interrupts(vcpu); >>> @@ -1141,6 +1149,10 @@ void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) >>> } >>> /* Free the VP */ >>> kfree(xc); >>> + >>> + /* Cleanup the vcpu */ >>> + vcpu->arch.irq_type = KVMPPC_IRQ_DEFAULT; >>> + vcpu->arch.xive_vcpu = NULL; >>> } >>> >>> int kvmppc_xive_connect_vcpu(struct kvm_device *dev, >>> @@ -1158,7 +1170,7 @@ int kvmppc_xive_connect_vcpu(struct kvm_device *dev, >>> } >>> if (xive->kvm != vcpu->kvm) >>> return -EPERM; >>> - if (vcpu->arch.irq_type) >>> + if (vcpu->arch.irq_type != KVMPPC_IRQ_DEFAULT) >>> return -EBUSY; >>> if (kvmppc_xive_find_server(vcpu->kvm, cpu)) { >>> pr_devel("Duplicate !\n"); >>> @@ -1855,6 +1867,39 @@ static void kvmppc_xive_free(struct kvm_device *dev) >>> kfree(dev); >>> } >>> >>> +static void kvmppc_xive_release(struct kvm_device *dev) >>> +{ >>> + struct kvmppc_xive *xive = dev->private; >>> + struct kvm *kvm = xive->kvm; >>> + struct kvm_vcpu *vcpu; >>> + int i; >>> + >>> + pr_devel("Releasing xive device\n"); >>> + >>> + /* >>> + * When releasing the KVM device fd, the vCPUs can still be >>> + * running and we should clean up the vCPU interrupt >>> + * presenters first. >>> + */ >>> + if (atomic_read(&kvm->online_vcpus) != 0) { >> >> What prevents online_vcpus from becoming non-zero after this test, but >> before the kvmppc_xive_free()? >> >> Is the test actually necessary? The operations below should be safe >> even if there are no online cpus, yes? > > Right... Similarly, the kick_all_cpus_sync() without anything having > been done before it that we want the other vcpus to notice made me > wonder what the point of it was. In other places where it is used we > have done something such as set kvm->arch.mmu_ready to 0 first. This part is more dubious. It comes from my understanding of the routine kvm_arch_destroy_vm() that makes sure all IPIs have been handled before clearing the VCPUs structures. commit e17769eb8c89 is a bit cryptic and looks like an optimization that the release operation can ignore ? Thanks, C. >>> + /* >>> + * call kick_all_cpus_sync() to ensure that all CPUs >>> + * have executed any pending interrupts >>> + */ >>> + if (is_kvmppc_hv_enabled(kvm)) >>> + kick_all_cpus_sync(); > > Paul. >