Re: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)

Jike Song <jike.song@xxxxxxxxx> · Tue, 26 Jan 2016 15:41:07 +0800

On 01/26/2016 05:30 AM, Alex Williamson wrote:
> [cc +Neo @Nvidia]
> 
> Hi Jike,
> 
> On Mon, 2016-01-25 at 19:34 +0800, Jike Song wrote:
>> On 01/20/2016 05:05 PM, Tian, Kevin wrote:
>>> I would expect we can spell out next level tasks toward above
>>> direction, upon which Alex can easily judge whether there are
>>> some common VFIO framework changes that he can help :-)
>>
>> Hi Alex,
>>
>> Here is a draft task list after a short discussion w/ Kevin,
>> would you please have a look?
>>
>> 	Bus Driver
>>
>> 		{ in i915/vgt/xxx.c }
>>
>> 		- define a subset of vfio_pci interfaces
>> 		- selective pass-through (say aperture)
>> 		- trap MMIO: interface w/ QEMU
> 
> What's included in the subset?  Certainly the bus reset ioctls really
> don't apply, but you'll need to support the full device interface,
> right?  That includes the region info ioctl and access through the vfio
> device file descriptor as well as the interrupt info and setup ioctls.
> 

[All interfaces I thought are via ioctl:)  For other stuff like file
descriptor we'll definitely keep it.]

The list of ioctl commands provided by vfio_pci:

	- VFIO_DEVICE_GET_PCI_HOT_RESET_INFO
	- VFIO_DEVICE_PCI_HOT_RESET

As you said, above 2 don't apply. But for this:

	- VFIO_DEVICE_RESET

In my opinion it should be kept, no matter what will be provided in
the bus driver.

	- VFIO_PCI_ROM_REGION_INDEX
	- VFIO_PCI_VGA_REGION_INDEX

I suppose above 2 don't apply neither? For a vgpu we don't provide a
ROM BAR or VGA region.

	- VFIO_DEVICE_GET_INFO
	- VFIO_DEVICE_GET_REGION_INFO
	- VFIO_DEVICE_GET_IRQ_INFO
	- VFIO_DEVICE_SET_IRQS

Above 4 are needed of course.

We will need to extend:

	- VFIO_DEVICE_GET_REGION_INFO

a) adding a flag: DONT_MAP. For example, the MMIO of vgpu
should be trapped instead of being mmap-ed.

b) adding other information. For example, for the OpRegion, QEMU need
to do more than mmap a region, it has to:

	- allocate a region
	- copy contents from somewhere in host to that region
	- mmap it to guest

I remember you already have a prototype for this?

>> 	IOMMU
>>
>> 		{ in a new vfio_xxx.c }
>>
>> 		- allocate: struct device & IOMMU group
> 
> It seems like the vgpu instance management would do this.
>

Yes, it can be removed from here.

>> 		- map/unmap functions for vgpu
>> 		- rb-tree to maintain iova/hpa mappings
> 
> Yep, pretty much what type1 does now, but without mapping through the
> IOMMU API.  Essentially just a database of the current userspace
> mappings that can be accessed for page pinning and IOVA->HPA
> translation.
> 

Yes.

>> 		- interacts with kvmgt.c
>>
>>
>> 	vgpu instance management
>>
>> 		{ in i915 }
>>
>> 		- path, create/destroy
>>
> 
> Yes, and since you're creating and destroying the vgpu here, this is
> where I'd expect a struct device to be created and added to an IOMMU
> group.  The lifecycle management should really include links between
> the vGPU and physical GPU, which would be much, much easier to do with
> struct devices create here rather than at the point where we start
> doing vfio "stuff".
> 

Yes, just like the SRIOV does.

> Nvidia has also been looking at this and has some ideas how we might
> standardize on some of the interfaces and create a vgpu framework to
> help share code between vendors and hopefully make a more consistent
> userspace interface for libvirt as well.  I'll let Neo provide some
> details.  Thanks,

Good to know that, so we can possibly cooperate on some common part,
e.g. the instance management :)

> 
> Alex
> 

--
Thanks,
Jike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html