On Thu, Apr 18, 2019 at 12:39:39PM +0200, Cédric Le Goater wrote: > The KVM XICS-over-XIVE device and the proposed KVM XIVE native device > implement an IRQ space for the guest using the generic IPI interrupts > of the XIVE IC controller. These interrupts are allocated at the OPAL > level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. > Interrupt management is performed in the XIVE way: using loads and > stores on the addresses of the XIVE IPI interrupt ESB pages. > > Both KVM devices share the same internal structure caching information > on the interrupts, among which the xive_irq_data struct containing the > addresses of the IPI ESB pages and an extra one in case of pass-through. > The later contains the addresses of the ESB pages of the underlying HW > controller interrupts, PHB4 in all cases for now. > > A guest, when running in the XICS legacy interrupt mode, lets the KVM > XICS-over-XIVE device "handle" interrupt management, that is to > perform the loads and stores on the addresses of the ESB pages of the > guest interrupts. However, when running in XIVE native exploitation > mode, the KVM XIVE native device exposes the interrupt ESB pages to > the guest and lets the guest perform directly the loads and stores. > > The VMA exposing the ESB pages make use of a custom VM fault handler > which role is to populate the VMA with appropriate pages. When a fault > occurs, the guest IRQ number is deduced from the offset, and the ESB > pages of associated XIVE IPI interrupt are inserted in the VMA (using > the internal structure caching information on the interrupts). > > Supporting device passthrough in the guest running in XIVE native > exploitation mode adds some extra refinements because the ESB pages > of a different HW controller (PHB4) need to be exposed to the guest > along with the initial IPI ESB pages of the XIVE IC controller. But > the overall mechanic is the same. > > When the device HW irqs are mapped into or unmapped from the guest > IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() > and kvmppc_xive_clr_mapped(), are called to record or clear the > passthrough interrupt information and to perform the switch. > > The approach taken by this patch is to clear the ESB pages of the > guest IRQ number being mapped and let the VM fault handler repopulate. > The handler will insert the ESB page corresponding to the HW interrupt > of the device being passed-through or the initial IPI ESB page if the > device is being removed. One comment below: > @@ -257,6 +289,13 @@ static int kvmppc_xive_native_mmap(struct kvm_device *dev, > > vma->vm_flags |= VM_IO | VM_PFNMAP; > vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot); > + > + /* > + * Grab the KVM device file address_space to be able to clear > + * the ESB pages mapping when a device is passed-through into > + * the guest. > + */ > + xive->mapping = vma->vm_file->f_mapping; I'm worried about the lifetime of this pointer. At the least you should clear it in the release function for the device, when you get one. It looks like you should hold the xive->mapping_lock around the clearing. I think that should be OK then since the release function won't get called until all of the mmaps of the fd have been destroyed, as I understand things. Paul.