> From: Neo Jia [mailto:cjia@xxxxxxxxxx] > Sent: Wednesday, January 27, 2016 5:31 AM > > On Tue, Jan 26, 2016 at 09:21:42PM +0000, Tian, Kevin wrote: > > > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > > > Sent: Wednesday, January 27, 2016 12:37 AM > > > > > > On Tue, 2016-01-26 at 22:05 +0800, Yang Zhang wrote: > > > > On 2016/1/26 15:41, Jike Song wrote: > > > > > On 01/26/2016 05:30 AM, Alex Williamson wrote: > > > > > > [cc +Neo @Nvidia] > > > > > > > > > > > > Hi Jike, > > > > > > > > > > > > On Mon, 2016-01-25 at 19:34 +0800, Jike Song wrote: > > > > > > > On 01/20/2016 05:05 PM, Tian, Kevin wrote: > > > > > > > > I would expect we can spell out next level tasks toward above > > > > > > > > direction, upon which Alex can easily judge whether there are > > > > > > > > some common VFIO framework changes that he can help :-) > > > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > > > Here is a draft task list after a short discussion w/ Kevin, > > > > > > > would you please have a look? > > > > > > > > > > > > > > Bus Driver > > > > > > > > > > > > > > { in i915/vgt/xxx.c } > > > > > > > > > > > > > > - define a subset of vfio_pci interfaces > > > > > > > - selective pass-through (say aperture) > > > > > > > - trap MMIO: interface w/ QEMU > > > > > > > > > > > > What's included in the subset? Certainly the bus reset ioctls really > > > > > > don't apply, but you'll need to support the full device interface, > > > > > > right? That includes the region info ioctl and access through the vfio > > > > > > device file descriptor as well as the interrupt info and setup ioctls. > > > > > > > > > > > > > > > > [All interfaces I thought are via ioctl:) For other stuff like file > > > > > descriptor we'll definitely keep it.] > > > > > > > > > > The list of ioctl commands provided by vfio_pci: > > > > > > > > > > - VFIO_DEVICE_GET_PCI_HOT_RESET_INFO > > > > > - VFIO_DEVICE_PCI_HOT_RESET > > > > > > > > > > As you said, above 2 don't apply. But for this: > > > > > > > > > > - VFIO_DEVICE_RESET > > > > > > > > > > In my opinion it should be kept, no matter what will be provided in > > > > > the bus driver. > > > > > > > > > > - VFIO_PCI_ROM_REGION_INDEX > > > > > - VFIO_PCI_VGA_REGION_INDEX > > > > > > > > > > I suppose above 2 don't apply neither? For a vgpu we don't provide a > > > > > ROM BAR or VGA region. > > > > > > > > > > - VFIO_DEVICE_GET_INFO > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > - VFIO_DEVICE_GET_IRQ_INFO > > > > > - VFIO_DEVICE_SET_IRQS > > > > > > > > > > Above 4 are needed of course. > > > > > > > > > > We will need to extend: > > > > > > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > > > > > > > > > > > a) adding a flag: DONT_MAP. For example, the MMIO of vgpu > > > > > should be trapped instead of being mmap-ed. > > > > > > > > I may not in the context, but i am curious how to handle the DONT_MAP in > > > > vfio driver? Since there are no real MMIO maps into the region and i > > > > suppose the access to the region should be handled by vgpu in i915 > > > > driver, but currently most of the mmio accesses are handled by Qemu. > > > > > > VFIO supports the following region attributes: > > > > > > #define VFIO_REGION_INFO_FLAG_READ (1 << 0) /* Region supports read */ > > > #define VFIO_REGION_INFO_FLAG_WRITE (1 << 1) /* Region supports write */ > > > #define VFIO_REGION_INFO_FLAG_MMAP (1 << 2) /* Region supports mmap */ > > > > > > If MMAP is not set, then the QEMU driver will do pread and/or pwrite to > > > the specified offsets of the device file descriptor, depending on what > > > accesses are supported. This is all reported through the REGION_INFO > > > ioctl for a given index. If mmap is supported, the VM will have direct > > > access to the area, without faulting to KVM other than to populate the > > > mapping. Without mmap support, a VM MMIO access traps into KVM, which > > > returns out to QEMU to service the request, which then finds the > > > MemoryRegion serviced through vfio, which will then perform a > > > pread/pwrite through to the kernel vfio bus driver to handle the > > > access. Thanks, > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to > > KVM, so VM MMIO access will be forwarded to KVMGT directly for > > emulation in kernel. If we reuse above R/W flags, the whole emulation > > path would be unnecessarily long with obvious performance impact. We > > either need a new flag here to indicate in-kernel emulation (bias from > > passthrough support), or just hide the region alternatively (let KVMGT > > to handle I/O emulation itself like today). > > > > Hi Kevin, > > Maybe there is some confusion about the VFIO interface that we are going to use > here. I thought we were going to adopt VFIO so nobody would need to directly > plug into kvm module. > We have reason to do so since looping kernel->user->kernel will incur several times of emulation overhead per trap which can generate obvious impact in some performance-critical path. We discussed it with KVM maintainer (iirc. Paolo) last year for the rationale behind. We can extend VFIO interface to support such model in general. Thanks Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html