On Mon, 22 Feb 2021 13:22:30 -0400 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Feb 22, 2021 at 09:51:13AM -0700, Alex Williamson wrote: > > > + vfio_device_unmap_mapping_range(vdev->device, > > + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX), > > + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) - > > + VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX)); > > Isn't this the same as invalidating everything? I see in > vfio_pci_mmap(): > > if (index >= VFIO_PCI_ROM_REGION_INDEX) > return -EINVAL; No, immediately above that is: if (index >= VFIO_PCI_NUM_REGIONS) { int regnum = index - VFIO_PCI_NUM_REGIONS; struct vfio_pci_region *region = vdev->region + regnum; if (region && region->ops && region->ops->mmap && (region->flags & VFIO_REGION_INFO_FLAG_MMAP)) return region->ops->mmap(vdev, region, vma); return -EINVAL; } We can have device specific regions that can support mmap, but those regions aren't necessarily on-device memory so we can't assume they're tied to the memory bit in the command register. > > @@ -2273,15 +2112,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data) > > > > vdev = vfio_device_data(device); > > > > - /* > > - * Locking multiple devices is prone to deadlock, runaway and > > - * unwind if we hit contention. > > - */ > > - if (!vfio_pci_zap_and_vma_lock(vdev, true)) { > > + if (!down_write_trylock(&vdev->memory_lock)) { > > vfio_device_put(device); > > return -EBUSY; > > } > > And this is only done as part of VFIO_DEVICE_PCI_HOT_RESET? Yes. > It looks like VFIO_DEVICE_PCI_HOT_RESET effects the entire slot? Yes. > How about putting the inode on the reflck structure, which is also > per-slot, and then a single unmap_mapping_range() will take care of > everything, no need to iterate over things in the driver core. > > Note the vm->pg_off space doesn't have any special meaning, it is > fine that two struct vfio_pci_device's are sharing the same address > space and using an incompatible overlapping pg_offs Ok, but how does this really help us, unless you're also proposing some redesign of the memory_lock semaphore? Even if we're zapping all the affected devices for a bus reset that doesn't eliminate that we still require device level granularity for other events. Maybe there's some layering of the inodes that you're implying that allows both, but it still feels like a minor optimization if we need to traverse devices for the memory_lock. > > diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h > > index 9cd1882a05af..ba37f4eeefd0 100644 > > +++ b/drivers/vfio/pci/vfio_pci_private.h > > @@ -101,6 +101,7 @@ struct vfio_pci_mmap_vma { > > > > struct vfio_pci_device { > > struct pci_dev *pdev; > > + struct vfio_device *device; > > Ah, I did this too, but I didn't use a pointer :) vfio_device is embedded in vfio.c, so that worries me. > All the places trying to call vfio_device_put() when they really want > a vfio_pci_device * become simpler now. Eg struct vfio_devices wants > to have an array of vfio_pci_device, and get_pf_vdev() only needs to > return one pointer. Sure, that example would be a good simplification. I'm not sure see other cases where we're going out of our way to manage the vfio_device versus vfio_pci_device objects though. Thanks, Alex