RE: [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
> Sent: Friday, June 24, 2016 11:37 AM
> 
> On Fri, 24 Jun 2016 10:52:58 +0800
> Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx> wrote:
> > On 2016/6/24 0:12, Alex Williamson wrote:
> > > On Mon, 30 May 2016 21:06:37 +0800
> > > Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx> wrote:
> > >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> > >> +{
> > >> +	struct resource *res;
> > >> +	int bar;
> > >> +	struct vfio_pci_dummy_resource *dummy_res;
> > >> +
> > >> +	INIT_LIST_HEAD(&vdev->dummy_resources_list);
> > >> +
> > >> +	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
> > >> +		res = vdev->pdev->resource + bar;
> > >> +
> > >> +		if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> > >> +			goto no_mmap;
> > >> +
> > >> +		if (!(res->flags & IORESOURCE_MEM))
> > >> +			goto no_mmap;
> > >> +
> > >> +		/*
> > >> +		 * The PCI core shouldn't set up a resource with a
> > >> +		 * type but zero size. But there may be bugs that
> > >> +		 * cause us to do that.
> > >> +		 */
> > >> +		if (!resource_size(res))
> > >> +			goto no_mmap;
> > >> +
> > >> +		if (resource_size(res) >= PAGE_SIZE) {
> > >> +			vdev->bar_mmap_supported[bar] = true;
> > >> +			continue;
> > >> +		}
> > >> +
> > >> +		if (!(res->start & ~PAGE_MASK)) {
> > >> +			/*
> > >> +			 * Add a dummy resource to reserve the remainder
> > >> +			 * of the exclusive page in case that hot-add
> > >> +			 * device's bar is assigned into it.
> > >> +			 */
> > >> +			dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
> > >> +			if (dummy_res == NULL)
> > >> +				goto no_mmap;
> > >> +
> > >> +			dummy_res->resource.start = res->end + 1;
> > >> +			dummy_res->resource.end = res->start + PAGE_SIZE - 1;
> > >> +			dummy_res->resource.flags = res->flags;
> > >> +			if (request_resource(res->parent,
> > >> +						&dummy_res->resource)) {
> > >> +				kfree(dummy_res);
> > >> +				goto no_mmap;
> > >> +			}
> > > Isn't it true that request_resource() only tells us that at a given
> > > point in time, no other drivers have reserved that resource?  It seems
> > > like it does not guarantee that the resource isn't routed to another
> > > device or that another driver won't at some point attempt to request
> > > that same resource.  So for example if a user constructs their initrd
> > > to bind vfio-pci to devices before other modules load, this
> > > request_resource() may succeed, at the expense of drivers loaded later
> > > now failing.  The behavior will depend on driver load order and we're
> > > not actually insuring that the overflow resource is unused, just that
> > > we got it first.  Can we do better?  Am I missing something that
> > > prevents this?  Thanks,
> > >
> > > Alex
> >
> > Couldn't PCI resources allocator prevent this, which will find a
> > empty slot in the resource tree firstly, then try to request that
> > resource in allocate_resource() when a PCI device is probed.
> > And I'd like to know why a PCI device driver would attempt to
> > call request_resource()? Should this be done in PCI enumeration?
> 
> Hi Yongji,
> 
> Looks like most pci drivers call pci_request_regions().  From there the
> call path is:
> 
> pci_request_selected_regions
>   __pci_request_selected_regions
>     __pci_request_region
>       __request_mem_region
>         __request_region
>           __request_resource
> 
> We see this driver ordering issue sometimes with users attempting to
> blacklist native pci drivers, trying to leave a device free for use by
> vfio-pci.  If the device is a graphics card, the generic vesa or uefi
> driver can request device resources causing a failure when vfio-pci
> tries to request those same resources.  I expect that unless it's a
> boot device, like vga in my example, the resources are not enabled
> until the driver opens the device, therefore the request_resource() call
> doesn't occur until that point.
> 
> For another trivial example, look at /proc/iomem as you load and unload
> a driver, on my laptop with e1000e unloaded I see:
> 
>   e1200000-e121ffff : 0000:00:19.0
>   e123e000-e123efff : 0000:00:19.0
> 
> When e1000e is loaded, each of these becomes claimed by the e1000e
> driver:
> 
>   e1200000-e121ffff : 0000:00:19.0
>     e1200000-e121ffff : e1000e
>   e123e000-e123efff : 0000:00:19.0
>     e123e000-e123efff : e1000e
> 
> Clearly pci core knows the resource is associated with the device, but
> I don't think we're tapping into that with request_resource(), we're
> just potentially stealing resources that another driver might have
> claimed otherwise as I described above.  That's my suspicion at
> least, feel free to show otherwise if it's incorrect.  Thanks,
> 
> Alex

It's a problem unless there is a way to trigger resource re-assignment
(e.g. pci_assign_resource) on devices which if claim on same resource
by VFIO. But doing this for every request_resource failure looks an
overkill...

Thanks
Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux