On 9/7/2016 10:14 PM, Alex Williamson wrote: > On Wed, 7 Sep 2016 21:45:31 +0530 > Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: > >> On 9/7/2016 2:58 AM, Alex Williamson wrote: >>> On Wed, 7 Sep 2016 01:05:11 +0530 >>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >>> >>>> On 9/6/2016 11:10 PM, Alex Williamson wrote: >>>>> On Sat, 3 Sep 2016 22:04:56 +0530 >>>>> Kirti Wankhede <kwankhede@xxxxxxxxxx> wrote: >>>>> >>>>>> On 9/3/2016 3:18 AM, Paolo Bonzini wrote: >>>>>>> >>>>>>> >>>>>>> On 02/09/2016 20:33, Kirti Wankhede wrote: >>>>>>>> <Alex> We could even do: >>>>>>>>>> >>>>>>>>>> echo $UUID1:$GROUPA > create >>>>>>>>>> >>>>>>>>>> where $GROUPA is the group ID of a previously created mdev device into >>>>>>>>>> which $UUID1 is to be created and added to the same group. >>>>>>>> </Alex> >>>>>>> >>>>>>> From the point of view of libvirt, I think I prefer Alex's idea. >>>>>>> <group> could be an additional element in the nodedev-create XML: >>>>>>> >>>>>>> <device> >>>>>>> <name>my-vgpu</name> >>>>>>> <parent>pci_0000_86_00_0</parent> >>>>>>> <capability type='mdev'> >>>>>>> <type id='11'/> >>>>>>> <uuid>0695d332-7831-493f-9e71-1c85c8911a08</uuid> >>>>>>> <group>group1</group> >>>>>>> </capability> >>>>>>> </device> >>>>>>> >>>>>>> (should group also be a UUID?) >>>>>>> >>>>>> >>>>>> No, this should be a unique number in a system, similar to iommu_group. >>>>> >>>>> Sorry, just trying to catch up on this thread after a long weekend. >>>>> >>>>> We're talking about iommu groups here, we're not creating any sort of >>>>> parallel grouping specific to mdev devices. >>>> >>>> I thought we were talking about group of mdev devices and not iommu >>>> group. IIRC, there were concerns about it (this would be similar to >>>> UUID+instance) and that would (ab)use iommu groups. >>> >>> What constraints does a group, which is not an iommu group, place on the >>> usage of the mdev devices? What happens if we put two mdev devices in >>> the same "mdev group" and then assign them to separate VMs/users? I >>> believe that the answer is that this theoretical "mdev group" doesn't >>> actually impose any constraints on the devices within the group or how >>> they're used. >>> >> >> We feel its not a good idea to try to associate device's iommu groups >> with mdev device groups. That adds more complications. >> >> As in above nodedev-create xml, 'group1' could be a unique number that >> can be generated by libvirt. Then to create mdev device: >> >> echo $UUID1:group1 > create >> >> If user want to add more mdev devices to same group, he/she should use >> same group number in next nodedev-create devices. So create commands >> would be: >> echo $UUID2:group1 > create >> echo $UUID3:group1 > create > > So groups return to being static, libvirt would need to destroy and > create mdev devices specifically for use within the predefined group? Yes. > This imposes limitations on how mdev devices can be used (ie. the mdev > pool option is once again removed). We're also back to imposing > grouping semantics on mdev devices that may not need them. Do all mdev > devices for a given user need to be put into the same group? Yes. > Do groups > span parent devices? Do they span different vendor drivers? > Yes and yes. Group number would be associated with mdev device irrespective of its parent. >> Each mdev device would store this group number in its mdev_device >> structure. >> >> With this, we would add open() and close() callbacks from vfio_mdev >> module for vendor driver to commit resources. Then we don't need >> 'start'/'stop' or online/offline interface. >> >> To commit resources for all devices associated to that domain/user space >> application, vendor driver can use 'first open()' and 'last close()' to >> free those. Or if vendor driver want to commit resources for each device >> separately, they can do in each device's open() call. It will depend on >> vendor driver how they want to implement. >> >> Libvirt don't have to do anything about assigned group numbers while >> managing mdev devices. >> >> QEMU commandline parameter would be same as earlier (don't have to >> mention group number here): >> >> -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID1 \ >> -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID2 >> >> In case if two mdev devices from same groups are assigned to different >> domains, we can fail open() call of second device. How would driver know >> that those are being used by different domain? By checking <group1, pid> >> of first device of 'group1'. The two devices in same group should have >> same pid in their open() call. > > Are you assuming that the two devices are owned by the same vendor > driver? No. See my reply to next questions below. > What if I put NVIDIA and Intel vGPUs both into the same group > and give each of them to a separate VM? It depends on where we put the logic to verify pid in open() call of each devices in group. If we place the logic of checking <group, pid> for devices in a group in vendor driver, then in above case both VMs would boot. But If we impose this logic in mdev core or vfio_mdev module, then open() on second device should fail. > How would the NVIDIA host > driver know which <group, pid> the Intel device got? How to make use of group number to commit resources for devices owned by a vendor would be vendor driver's responsibility. NVIDIA driver doesn't need to know about Intel's vGPU nor Intel driver need to know about NVIDIA's vGPU. > This is what the > iommu groups do that a different layer of grouping cannot do. Maybe > you're suggesting a group per vendor driver, but how does libvirt know > the vendor driver? Do they need to go research the parent device in > sysfs and compare driver links? > No, group is not associated with vendor driver. Group number is associated iwth mdev device. Thanks, Kirti -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html