Re: [Qemu-devel] [PATCH v7 0/4] Add Mediated device support

Kirti Wankhede <kwankhede@xxxxxxxxxx> · Wed, 7 Sep 2016 23:36:28 +0530

On 9/7/2016 10:14 PM, Alex Williamson wrote:
> On Wed, 7 Sep 2016 21:45:31 +0530
> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
> 
>> On 9/7/2016 2:58 AM, Alex Williamson wrote:
>>> On Wed, 7 Sep 2016 01:05:11 +0530
>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>>>   
>>>> On 9/6/2016 11:10 PM, Alex Williamson wrote:  
>>>>> On Sat, 3 Sep 2016 22:04:56 +0530
>>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote:
>>>>>     
>>>>>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote:    
>>>>>>>
>>>>>>>
>>>>>>> On 02/09/2016 20:33, Kirti Wankhede wrote:      
>>>>>>>> <Alex> We could even do:      
>>>>>>>>>>
>>>>>>>>>> echo $UUID1:$GROUPA > create
>>>>>>>>>>
>>>>>>>>>> where $GROUPA is the group ID of a previously created mdev device into
>>>>>>>>>> which $UUID1 is to be created and added to the same group.      
>>>>>>>> </Alex>      
>>>>>>>
>>>>>>> From the point of view of libvirt, I think I prefer Alex's idea.
>>>>>>> <group> could be an additional element in the nodedev-create XML:
>>>>>>>
>>>>>>>     <device>
>>>>>>>       <name>my-vgpu</name>
>>>>>>>       <parent>pci_0000_86_00_0</parent>
>>>>>>>       <capability type='mdev'>
>>>>>>>         <type id='11'/>
>>>>>>>         <uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid>
>>>>>>>         <group>group1</group>
>>>>>>>       </capability>
>>>>>>>     </device>
>>>>>>>
>>>>>>> (should group also be a UUID?)
>>>>>>>       
>>>>>>
>>>>>> No, this should be a unique number in a system, similar to iommu_group.    
>>>>>
>>>>> Sorry, just trying to catch up on this thread after a long weekend.
>>>>>
>>>>> We're talking about iommu groups here, we're not creating any sort of
>>>>> parallel grouping specific to mdev devices.    
>>>>
>>>> I thought we were talking about group of mdev devices and not iommu
>>>> group. IIRC, there were concerns about it (this would be similar to
>>>> UUID+instance) and that would (ab)use iommu groups.  
>>>
>>> What constraints does a group, which is not an iommu group, place on the
>>> usage of the mdev devices?  What happens if we put two mdev devices in
>>> the same "mdev group" and then assign them to separate VMs/users?  I
>>> believe that the answer is that this theoretical "mdev group" doesn't
>>> actually impose any constraints on the devices within the group or how
>>> they're used.
>>>   
>>
>> We feel its not a good idea to try to associate device's iommu groups
>> with mdev device groups. That adds more complications.
>>
>> As in above nodedev-create xml, 'group1' could be a unique number that
>> can be generated by libvirt. Then to create mdev device:
>>
>>   echo $UUID1:group1 > create
>>
>> If user want to add more mdev devices to same group, he/she should use
>> same group number in next nodedev-create devices. So create commands
>> would be:
>>   echo $UUID2:group1 > create
>>   echo $UUID3:group1 > create
> 
> So groups return to being static, libvirt would need to destroy and
> create mdev devices specifically for use within the predefined group?

Yes.

> This imposes limitations on how mdev devices can be used (ie. the mdev
> pool option is once again removed).  We're also back to imposing
> grouping semantics on mdev devices that may not need them.  Do all mdev
> devices for a given user need to be put into the same group?  

Yes.

> Do groups
> span parent devices?  Do they span different vendor drivers?
> 

Yes and yes. Group number would be associated with mdev device
irrespective of its parent.

>> Each mdev device would store this group number in its mdev_device
>> structure.
>>
>> With this, we would add open() and close() callbacks from vfio_mdev
>> module for vendor driver to commit resources. Then we don't need
>> 'start'/'stop' or online/offline interface.
>>
>> To commit resources for all devices associated to that domain/user space
>> application, vendor driver can use 'first open()' and 'last close()' to
>> free those. Or if vendor driver want to commit resources for each device
>> separately, they can do in each device's open() call. It will depend on
>> vendor driver how they want to implement.
>>
>> Libvirt don't have to do anything about assigned group numbers while
>> managing mdev devices.
>>
>> QEMU commandline parameter would be same as earlier (don't have to
>> mention group number here):
>>
>>   -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID1 \
>>   -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID2
>>
>> In case if two mdev devices from same groups are assigned to different
>> domains, we can fail open() call of second device. How would driver know
>> that those are being used by different domain? By checking <group1, pid>
>> of first device of 'group1'. The two devices in same group should have
>> same pid in their open() call.
> 
> Are you assuming that the two devices are owned by the same vendor
> driver?

No. See my reply to next questions below.

>  What if I put NVIDIA and Intel vGPUs both into the same group
> and give each of them to a separate VM?

It depends on where we put the logic to verify pid in open() call of
each devices in group.
If we place the logic of checking <group, pid> for devices in a group in
vendor driver, then in above case both VMs would boot.
But If we impose this logic in mdev core or vfio_mdev module, then
open() on second device should fail.

>  How would the NVIDIA host
> driver know which <group, pid> the Intel device got?

How to make use of group number to commit resources for devices owned by
a vendor would be vendor driver's responsibility. NVIDIA driver doesn't
need to know about Intel's vGPU nor Intel driver need to know about
NVIDIA's vGPU.

>  This is what the
> iommu groups do that a different layer of grouping cannot do.  Maybe
> you're suggesting a group per vendor driver, but how does libvirt know
> the vendor driver?  Do they need to go research the parent device in
> sysfs and compare driver links?
>  

No, group is not associated with vendor driver. Group number is
associated iwth mdev device.

Thanks,
Kirti
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html