On 4/25/19 9:07 AM, Paul Mackerras wrote: > On Thu, Apr 18, 2019 at 12:39:39PM +0200, Cédric Le Goater wrote: >> The KVM XICS-over-XIVE device and the proposed KVM XIVE native device >> implement an IRQ space for the guest using the generic IPI interrupts >> of the XIVE IC controller. These interrupts are allocated at the OPAL >> level and "mapped" into the guest IRQ number space in the range 0-0x1FFF. >> Interrupt management is performed in the XIVE way: using loads and >> stores on the addresses of the XIVE IPI interrupt ESB pages. >> >> Both KVM devices share the same internal structure caching information >> on the interrupts, among which the xive_irq_data struct containing the >> addresses of the IPI ESB pages and an extra one in case of pass-through. >> The later contains the addresses of the ESB pages of the underlying HW >> controller interrupts, PHB4 in all cases for now. >> >> A guest, when running in the XICS legacy interrupt mode, lets the KVM >> XICS-over-XIVE device "handle" interrupt management, that is to >> perform the loads and stores on the addresses of the ESB pages of the >> guest interrupts. However, when running in XIVE native exploitation >> mode, the KVM XIVE native device exposes the interrupt ESB pages to >> the guest and lets the guest perform directly the loads and stores. >> >> The VMA exposing the ESB pages make use of a custom VM fault handler >> which role is to populate the VMA with appropriate pages. When a fault >> occurs, the guest IRQ number is deduced from the offset, and the ESB >> pages of associated XIVE IPI interrupt are inserted in the VMA (using >> the internal structure caching information on the interrupts). >> >> Supporting device passthrough in the guest running in XIVE native >> exploitation mode adds some extra refinements because the ESB pages >> of a different HW controller (PHB4) need to be exposed to the guest >> along with the initial IPI ESB pages of the XIVE IC controller. But >> the overall mechanic is the same. >> >> When the device HW irqs are mapped into or unmapped from the guest >> IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped() >> and kvmppc_xive_clr_mapped(), are called to record or clear the >> passthrough interrupt information and to perform the switch. >> >> The approach taken by this patch is to clear the ESB pages of the >> guest IRQ number being mapped and let the VM fault handler repopulate. >> The handler will insert the ESB page corresponding to the HW interrupt >> of the device being passed-through or the initial IPI ESB page if the >> device is being removed. > > One comment below: > >> @@ -257,6 +289,13 @@ static int kvmppc_xive_native_mmap(struct kvm_device *dev, >> >> vma->vm_flags |= VM_IO | VM_PFNMAP; >> vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot); >> + >> + /* >> + * Grab the KVM device file address_space to be able to clear >> + * the ESB pages mapping when a device is passed-through into >> + * the guest. >> + */ >> + xive->mapping = vma->vm_file->f_mapping; > > I'm worried about the lifetime of this pointer. At the least you > should clear it in the release function for the device, when you get > one. It looks like you should hold the xive->mapping_lock around the > clearing. Yes. This is should be cleaner. > I think that should be OK then since the release function > won't get called until all of the mmaps of the fd have been destroyed, > as I understand things. and you also have to close the fd by exiting QEMU or resetting the guest. Anyhow, I will send a cleanup in a followup patch. Thanks, C.