On Wed, May 04, 2016 at 11:06:19AM -0600, Alex Williamson wrote: > On Wed, 4 May 2016 03:23:13 +0000 > "Tian, Kevin" <kevin.tian@xxxxxxxxx> wrote: > > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > > > Sent: Wednesday, May 04, 2016 6:43 AM > > > > + > > > > + if (gpu_dev->ops->write) { > > > > + ret = gpu_dev->ops->write(vgpu_dev, > > > > + user_data, > > > > + count, > > > > + vgpu_emul_space_config, > > > > + pos); > > > > + } > > > > + > > > > + memcpy((void *)(vdev->vconfig + pos), (void *)user_data, count); > > > > > > So write is expected to user_data to allow only the writable bits to be > > > changed? What's really being saved in the vconfig here vs the vendor > > > vgpu driver? It seems like we're only using it to cache the BAR > > > values, but we're not providing the BAR emulation here, which seems > > > like one of the few things we could provide so it's not duplicated in > > > every vendor driver. But then we only need a few u32s to do that, not > > > all of config space. > > > > We can borrow same vconfig emulation from existing vfio-pci driver. > > But doing so doesn't mean that vendor vgpu driver cannot have its > > own vconfig emulation further. vGPU is not like a real device, since > > there may be no physical config space implemented for each vGPU. > > So anyway vendor vGPU driver needs to create/emulate the virtualized > > config space while the way how is created might be vendor specific. > > So better to keep the interface to access raw vconfig space from > > vendor vGPU driver. > > I'm hoping config space will be very simple for a vgpu, so I don't know > that it makes sense to add that complexity early on. Neo/Kirti, what > capabilities do you expect to provide? Who provides the MSI > capability? Is a PCIe capability provided? Others? Currently only standard PCI caps. MSI cap is emulated by the vendor drivers via the above interface. No PCIe caps so far. > > > > > +static ssize_t vgpu_dev_rw(void *device_data, char __user *buf, > > > > + size_t count, loff_t *ppos, bool iswrite) > > > > +{ > > > > + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); > > > > + struct vfio_vgpu_device *vdev = device_data; > > > > + > > > > + if (index >= VFIO_PCI_NUM_REGIONS) > > > > + return -EINVAL; > > > > + > > > > + switch (index) { > > > > + case VFIO_PCI_CONFIG_REGION_INDEX: > > > > + return vgpu_dev_config_rw(vdev, buf, count, ppos, iswrite); > > > > + > > > > + case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: > > > > + return vgpu_dev_bar_rw(vdev, buf, count, ppos, iswrite); > > > > + > > > > + case VFIO_PCI_ROM_REGION_INDEX: > > > > + case VFIO_PCI_VGA_REGION_INDEX: > > > > > > Wait a sec, who's doing the VGA emulation? We can't be claiming to > > > support a VGA region and then fail to provide read/write access to it > > > like we said it has. > > > > For Intel side we plan to not support VGA region when upstreaming our > > KVMGT work, which means Intel vGPU will be exposed only as a > > secondary graphics card then so legacy VGA is not required. Also no > > VBIOS/ROM requirement. Guess we can remove above two regions. > > So this needs to be optional based on what the mediation driver > provides. It seems like we're just making passthroughs for the vendor > mediation driver to speak vfio. > > > > > + > > > > +static int vgpu_dev_mmio_fault(struct vm_area_struct *vma, struct vm_fault *vmf) > > > > +{ > > > > + int ret = 0; > > > > + struct vfio_vgpu_device *vdev = vma->vm_private_data; > > > > + struct vgpu_device *vgpu_dev; > > > > + struct gpu_device *gpu_dev; > > > > + u64 virtaddr = (u64)vmf->virtual_address; > > > > + u64 offset, phyaddr; > > > > + unsigned long req_size, pgoff; > > > > + pgprot_t pg_prot; > > > > + > > > > + if (!vdev && !vdev->vgpu_dev) > > > > + return -EINVAL; > > > > + > > > > + vgpu_dev = vdev->vgpu_dev; > > > > + gpu_dev = vgpu_dev->gpu_dev; > > > > + > > > > + offset = vma->vm_pgoff << PAGE_SHIFT; > > > > + phyaddr = virtaddr - vma->vm_start + offset; > > > > + pgoff = phyaddr >> PAGE_SHIFT; > > > > + req_size = vma->vm_end - virtaddr; > > > > + pg_prot = vma->vm_page_prot; > > > > + > > > > + if (gpu_dev->ops->validate_map_request) { > > > > + ret = gpu_dev->ops->validate_map_request(vgpu_dev, virtaddr, &pgoff, > > > > + &req_size, &pg_prot); > > > > + if (ret) > > > > + return ret; > > > > + > > > > + if (!req_size) > > > > + return -EINVAL; > > > > + } > > > > + > > > > + ret = remap_pfn_range(vma, virtaddr, pgoff, req_size, pg_prot); > > > > > > So not supporting validate_map_request() means that the user can > > > directly mmap BARs of the host GPU and as shown below, we assume a 1:1 > > > mapping of vGPU BAR to host GPU BAR. Is that ever valid in a vGPU > > > scenario or should this callback be required? It's not clear to me how > > > the vendor driver determines what this maps to, do they compare it to > > > the physical device's own BAR addresses? > > > > I didn't quite understand too. Based on earlier discussion, do we need > > something like this, or could achieve the purpose just by leveraging > > recent sparse mmap support? > > The reason for faulting in the mmio space, if I recall correctly, is to > enable an ordering where the user driver (QEMU) can mmap regions of the > device prior to resources being allocated on the host GPU to handle > them. Sparse mmap only partially handles that, it's not dynamic. With > this faulting mechanism, the host GPU doesn't need to commit resources > until the mmap is actually accessed. Thanks, Correct. Thanks, Neo > > Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html