Re: [PATCH 19/19] KVM: introduce a KVM_DELETE_DEVICE ioctl

Cédric Le Goater <clg@xxxxxxxx> · Wed, 23 Jan 2019 19:39:55 +0100

On 1/22/19 6:42 AM, Paul Mackerras wrote:
> On Mon, Jan 07, 2019 at 08:10:06PM +0100, Cédric Le Goater wrote:
>> This will be used to destroy the KVM XICS or XIVE device when the
>> sPAPR machine is reseted. When the VM boots, the CAS negotiation
>> process will determine which interrupt mode to use and the appropriate
>> KVM device will then be created.
> 
> What would be the consequence if we didn't destroy the device?

So, if we don't destroy the device, it would mean that we are 
maintaining its availability under the KVM PPC structures, VM and
vCPUs, I think the changes would be significant to have two interrupt 
devices unde the VM. We would also need a way to activate one or 
the other depending on the interrupt mode chosen by CAS. In other 
words, it's moving all the interrupt mode politics from QEMU to KVM. 
It's possible of course but I would prefer to leave the ugly details 
in QEMU.  

Let's suppose now that we keep the device alive but disconnect the 
presenters from it, and from the VM also. We would have an unused 
device in the VM. We would need way to keep an handle on it (fd 
certainly) and a KVM interface to soft reset a KVM device partially 
initialized. That's one other option.

It seemed easier to do an hard reset : create/destroy.  

> The reason I ask is that we will have to be much more careful about
> memory allocation lifetimes with this patch. 

yes. bad refcounting will lead the host kernel to a crash. 

> Having KVM devices last
> until the KVM instance is destroyed means that we generally avoid
> use-after-free bugs.  With this patch we will have to do a careful
> analysis of the lifetime of the xive structures vs. possible accesses
> on other threads to prove there are no use-after-free bugs.
> 
> For example, it is not sufficient to set any pointers in struct kvm or
> struct kvm_vcpu that point into xive structures to NULL before freeing
> the structures.  There could be code on another CPU that has read the
> pointer value before you set it to NULL and then goes and accesses it
> after you have freed it.  You need to prove that can't happen,
> possibly using some sort of explicit synchronization that ensures that
> no other CPU could still be accessing the structure at the time when
> you free it.  RCU can help with this, but in general means you need
> RCU synchronization primitives (rcu_read_lock() etc.) at all the
> places where you use the pointer, which I don't think you currently
> have.

no. indeed. I have overlooked the synchronization aspect.

> If there is a good fundamental reason why this can't happen, even
> though you don't have explicit synchronization, then at a minimum you
> need to explain that in the patch description, and ideally also in
> code comments.

OK. I did leave that patch at the end for one reason. It needs more care.

Thanks,

C.