> From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > Sent: Friday, June 24, 2016 11:37 AM > > On Fri, 24 Jun 2016 10:52:58 +0800 > Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx> wrote: > > On 2016/6/24 0:12, Alex Williamson wrote: > > > On Mon, 30 May 2016 21:06:37 +0800 > > > Yongji Xie <xyjxie@xxxxxxxxxxxxxxxxxx> wrote: > > >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev) > > >> +{ > > >> + struct resource *res; > > >> + int bar; > > >> + struct vfio_pci_dummy_resource *dummy_res; > > >> + > > >> + INIT_LIST_HEAD(&vdev->dummy_resources_list); > > >> + > > >> + for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) { > > >> + res = vdev->pdev->resource + bar; > > >> + > > >> + if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP)) > > >> + goto no_mmap; > > >> + > > >> + if (!(res->flags & IORESOURCE_MEM)) > > >> + goto no_mmap; > > >> + > > >> + /* > > >> + * The PCI core shouldn't set up a resource with a > > >> + * type but zero size. But there may be bugs that > > >> + * cause us to do that. > > >> + */ > > >> + if (!resource_size(res)) > > >> + goto no_mmap; > > >> + > > >> + if (resource_size(res) >= PAGE_SIZE) { > > >> + vdev->bar_mmap_supported[bar] = true; > > >> + continue; > > >> + } > > >> + > > >> + if (!(res->start & ~PAGE_MASK)) { > > >> + /* > > >> + * Add a dummy resource to reserve the remainder > > >> + * of the exclusive page in case that hot-add > > >> + * device's bar is assigned into it. > > >> + */ > > >> + dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL); > > >> + if (dummy_res == NULL) > > >> + goto no_mmap; > > >> + > > >> + dummy_res->resource.start = res->end + 1; > > >> + dummy_res->resource.end = res->start + PAGE_SIZE - 1; > > >> + dummy_res->resource.flags = res->flags; > > >> + if (request_resource(res->parent, > > >> + &dummy_res->resource)) { > > >> + kfree(dummy_res); > > >> + goto no_mmap; > > >> + } > > > Isn't it true that request_resource() only tells us that at a given > > > point in time, no other drivers have reserved that resource? It seems > > > like it does not guarantee that the resource isn't routed to another > > > device or that another driver won't at some point attempt to request > > > that same resource. So for example if a user constructs their initrd > > > to bind vfio-pci to devices before other modules load, this > > > request_resource() may succeed, at the expense of drivers loaded later > > > now failing. The behavior will depend on driver load order and we're > > > not actually insuring that the overflow resource is unused, just that > > > we got it first. Can we do better? Am I missing something that > > > prevents this? Thanks, > > > > > > Alex > > > > Couldn't PCI resources allocator prevent this, which will find a > > empty slot in the resource tree firstly, then try to request that > > resource in allocate_resource() when a PCI device is probed. > > And I'd like to know why a PCI device driver would attempt to > > call request_resource()? Should this be done in PCI enumeration? > > Hi Yongji, > > Looks like most pci drivers call pci_request_regions(). From there the > call path is: > > pci_request_selected_regions > __pci_request_selected_regions > __pci_request_region > __request_mem_region > __request_region > __request_resource > > We see this driver ordering issue sometimes with users attempting to > blacklist native pci drivers, trying to leave a device free for use by > vfio-pci. If the device is a graphics card, the generic vesa or uefi > driver can request device resources causing a failure when vfio-pci > tries to request those same resources. I expect that unless it's a > boot device, like vga in my example, the resources are not enabled > until the driver opens the device, therefore the request_resource() call > doesn't occur until that point. > > For another trivial example, look at /proc/iomem as you load and unload > a driver, on my laptop with e1000e unloaded I see: > > e1200000-e121ffff : 0000:00:19.0 > e123e000-e123efff : 0000:00:19.0 > > When e1000e is loaded, each of these becomes claimed by the e1000e > driver: > > e1200000-e121ffff : 0000:00:19.0 > e1200000-e121ffff : e1000e > e123e000-e123efff : 0000:00:19.0 > e123e000-e123efff : e1000e > > Clearly pci core knows the resource is associated with the device, but > I don't think we're tapping into that with request_resource(), we're > just potentially stealing resources that another driver might have > claimed otherwise as I described above. That's my suspicion at > least, feel free to show otherwise if it's incorrect. Thanks, > > Alex It's a problem unless there is a way to trigger resource re-assignment (e.g. pci_assign_resource) on devices which if claim on same resource by VFIO. But doing this for every request_resource failure looks an overkill... Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html