On 5/5/2016 2:44 AM, Neo Jia wrote:
On Wed, May 04, 2016 at 11:06:19AM -0600, Alex Williamson wrote:
On Wed, 4 May 2016 03:23:13 +0000
"Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote:
From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx]
Sent: Wednesday, May 04, 2016 6:43 AM
+
+ if (gpu_dev->ops->write) {
+ ret = gpu_dev->ops->write(vgpu_dev,
+ user_data,
+ count,
+ vgpu_emul_space_config,
+ pos);
+ }
+
+ memcpy((void *)(vdev->vconfig + pos), (void *)user_data, count);
So write is expected to user_data to allow only the writable bits to be
changed? What's really being saved in the vconfig here vs the vendor
vgpu driver? It seems like we're only using it to cache the BAR
values, but we're not providing the BAR emulation here, which seems
like one of the few things we could provide so it's not duplicated in
every vendor driver. But then we only need a few u32s to do that, not
all of config space.
We can borrow same vconfig emulation from existing vfio-pci driver.
But doing so doesn't mean that vendor vgpu driver cannot have its
own vconfig emulation further. vGPU is not like a real device, since
there may be no physical config space implemented for each vGPU.
So anyway vendor vGPU driver needs to create/emulate the virtualized
config space while the way how is created might be vendor specific.
So better to keep the interface to access raw vconfig space from
vendor vGPU driver.
I'm hoping config space will be very simple for a vgpu, so I don't know
that it makes sense to add that complexity early on. Neo/Kirti, what
capabilities do you expect to provide? Who provides the MSI
capability? Is a PCIe capability provided? Others?
From VGPU_VFIO point of view, VGPU_VFIO would not provide or modify any
capabilities. Vendor vGPU driver should provide config space. Then
vendor driver can provide PCI capabilities or PCIe capabilities, it
might also have vendor specific information. VGPU_VFIO driver would not
intercept that information.
Currently only standard PCI caps.
MSI cap is emulated by the vendor drivers via the above interface.
No PCIe caps so far.
Nvidia vGPU device is standard PCI device. We tested standard PCI caps.
Thanks,
Kirti.
+static ssize_t vgpu_dev_rw(void *device_data, char __user *buf,
+ size_t count, loff_t *ppos, bool iswrite)
+{
+ unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos);
+ struct vfio_vgpu_device *vdev = device_data;
+
+ if (index >= VFIO_PCI_NUM_REGIONS)
+ return -EINVAL;
+
+ switch (index) {
+ case VFIO_PCI_CONFIG_REGION_INDEX:
+ return vgpu_dev_config_rw(vdev, buf, count, ppos, iswrite);
+
+ case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX:
+ return vgpu_dev_bar_rw(vdev, buf, count, ppos, iswrite);
+
+ case VFIO_PCI_ROM_REGION_INDEX:
+ case VFIO_PCI_VGA_REGION_INDEX:
Wait a sec, who's doing the VGA emulation? We can't be claiming to
support a VGA region and then fail to provide read/write access to it
like we said it has.
For Intel side we plan to not support VGA region when upstreaming our
KVMGT work, which means Intel vGPU will be exposed only as a
secondary graphics card then so legacy VGA is not required. Also no
VBIOS/ROM requirement. Guess we can remove above two regions.
So this needs to be optional based on what the mediation driver
provides. It seems like we're just making passthroughs for the vendor
mediation driver to speak vfio.
+
+static int vgpu_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+ int ret = 0;
+ struct vfio_vgpu_device *vdev = vma->vm_private_data;
+ struct vgpu_device *vgpu_dev;
+ struct gpu_device *gpu_dev;
+ u64 virtaddr = (u64)vmf->virtual_address;
+ u64 offset, phyaddr;
+ unsigned long req_size, pgoff;
+ pgprot_t pg_prot;
+
+ if (!vdev && !vdev->vgpu_dev)
+ return -EINVAL;
+
+ vgpu_dev = vdev->vgpu_dev;
+ gpu_dev = vgpu_dev->gpu_dev;
+
+ offset = vma->vm_pgoff << PAGE_SHIFT;
+ phyaddr = virtaddr - vma->vm_start + offset;
+ pgoff = phyaddr >> PAGE_SHIFT;
+ req_size = vma->vm_end - virtaddr;
+ pg_prot = vma->vm_page_prot;
+
+ if (gpu_dev->ops->validate_map_request) {
+ ret = gpu_dev->ops->validate_map_request(vgpu_dev, virtaddr, &pgoff,
+ &req_size, &pg_prot);
+ if (ret)
+ return ret;
+
+ if (!req_size)
+ return -EINVAL;
+ }
+
+ ret = remap_pfn_range(vma, virtaddr, pgoff, req_size, pg_prot);
So not supporting validate_map_request() means that the user can
directly mmap BARs of the host GPU and as shown below, we assume a 1:1
mapping of vGPU BAR to host GPU BAR. Is that ever valid in a vGPU
scenario or should this callback be required? It's not clear to me how
the vendor driver determines what this maps to, do they compare it to
the physical device's own BAR addresses?
I didn't quite understand too. Based on earlier discussion, do we need
something like this, or could achieve the purpose just by leveraging
recent sparse mmap support?
The reason for faulting in the mmio space, if I recall correctly, is to
enable an ordering where the user driver (QEMU) can mmap regions of the
device prior to resources being allocated on the host GPU to handle
them. Sparse mmap only partially handles that, it's not dynamic. With
this faulting mechanism, the host GPU doesn't need to commit resources
until the mmap is actually accessed. Thanks,
Correct.
Thanks,
Neo
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html