On 4/11/19 6:38 AM, David Gibson wrote: > On Thu, Apr 11, 2019 at 01:16:25PM +1000, Paul Mackerras wrote: >> On Wed, Apr 10, 2019 at 07:04:48PM +0200, Cédric Le Goater wrote: >>> When a P9 sPAPR VM boots, the CAS negotiation process determines which >>> interrupt mode to use (XICS legacy or XIVE native) and invokes a >>> machine reset to activate the chosen mode. >>> >>> To be able to switch from one mode to another, we introduce the >>> capability to release a KVM device without destroying the VM. The KVM >>> device interface is extended with a new 'release' operation which is >>> called when the file descriptor of the device is closed. >> >> I believe the release operation is not called until all of the mmaps >> using the fd are unmapped - which is a good thing for us, since it >> means the guest can't possibly be accessing the XIVE directly. yes. >> You might want to reword that last paragraph to mention that. ok. >>> Such operations are defined for the XICS-on-XIVE and the XIVE native >>> KVM devices. They clear the vCPU interrupt presenters that could be >>> attached and then destroy the device. >>> >>> This is not considered as a safe operation as the vCPUs are still >>> running and could be referencing the KVM device through their >>> presenters. To protect the system from any breakage, the kvmppc_xive >>> objects representing both KVM devices are now stored in an array under >>> the VM. Allocation is performed on first usage and memory is freed >>> only when the VM exits. >> >> One quick comment below: >> >>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c >>> index 480a3fc6b9fd..064a9f2ae678 100644 >>> --- a/arch/powerpc/kvm/book3s_xive.c >>> +++ b/arch/powerpc/kvm/book3s_xive.c >>> @@ -1100,11 +1100,19 @@ void kvmppc_xive_disable_vcpu_interrupts(struct kvm_vcpu *vcpu) >>> void kvmppc_xive_cleanup_vcpu(struct kvm_vcpu *vcpu) >>> { >>> struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu; >>> - struct kvmppc_xive *xive = xc->xive; >>> + struct kvmppc_xive *xive; >>> int i; >>> >>> + if (!kvmppc_xics_enabled(vcpu)) >>> + return; >> >> Should that be kvmppc_xive_enabled() rather than xics? > > I think I asked that on an earlier iteration, and the answer is no. > The names are confusing, but this file is all about xics-on-xive > rather than xive native. So here we're checking what's available from > the guest's point of view, so "xics", but most of the surrounding > functions are named "xive" because that's the backend. > yes. The relevant part is at the end of the kvmppc_xive_connect_vcpu() routine : int kvmppc_xive_connect_vcpu(struct kvm_device *dev, struct kvm_vcpu *vcpu, u32 cpu) { ... vcpu->arch.irq_type = KVMPPC_IRQ_XICS; return 0; } David suggested a few cleanups that we could do in the xics-on-xive device. We might want to introduce a KVMPPC_IRQ_XICS_ON_XIVE flag also. First, I would like to get rid of references to the kvmppc_xive struct and remove some useless attributes to improve locking. Once the XIVE native mode is merged, all kernels above 4.14 running on a P9 sPAPR guest will switch to XIVE and the xics-on-xive device will only be useful for nested. C.