Hi Alex, > From: Alex Williamson [mailto:alex.williamson@xxxxxxxxxx] > Sent: Thursday, July 4, 2019 1:22 AM > To: Liu, Yi L <yi.l.liu@xxxxxxxxx> > Subject: Re: [PATCH v1 9/9] smaples: add vfio-mdev-pci driver > > On Wed, 3 Jul 2019 08:25:25 +0000 > "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote: > > > Hi Alex, > > > > Thanks for the comments. Have four inline responses below. And one > > of them need your further help. :-) [...] > > > > > > > > > > > > > > used iommu_attach_device() rather than iommu_attach_group() > > > > > > > > > for non-aux mdev iommu_device. Is there a requirement that > > > > > > > > > the mdev parent device is in a singleton iommu group? > > > > > > > > > > > > > > > > I don't think there should have such limitation. Per my > > > > > > > > understanding, vfio-mdev-pci should also be able to bind to > > > > > > > > devices which shares iommu group with other devices. vfio-pci works > well > > > for such devices. > > > > > > > > And since the two drivers share most of the codes, I think > > > > > > > > vfio-mdev-pci should naturally support it as well. > > > > > > > > > > > > > > Yes, the difference though is that vfio.c knows when devices are > > > > > > > in the same group, which mdev vfio.c only knows about the > > > > > > > non-iommu backed group, not the group that is actually used for > > > > > > > the iommu backing. So we either need to enlighten vfio.c or > > > > > > > further abstract those details in vfio_iommu_type1.c. > > > > > > > > > > > > Not sure if it is necessary to introduce more changes to vfio.c or > > > > > > vfio_iommu_type1.c. If it's only for the scenario which two > > > > > > devices share an iommu_group, I guess it could be supported by > > > > > > using __iommu_attach_device() which has no device counting for the > > > > > > group. But maybe I missed something here. It would be great if you > > > > > > can elaborate a bit for it. :-) > > > > > > > > > > We need to use the group semantics, there's a reason > > > > > __iommu_attach_device() is not exposed, it's an internal helper. I > > > > > think there's no way around that we need to somewhere track the > > > > > actual group we're attaching to and have the smarts to re-use it for > > > > > other devices in the same group. > > > > > > > > Hmmm, exposing __iommu_attach_device() is not good, let's forget it. > > > > :-) > > > > > > > > > > > > > If this is a simplification, then vfio-mdev-pci should not > > > > > > > > > bind to devices where this is violated since there's no way > > > > > > > > > to use the device. Can we support it though? > > > > > > > > > > > > > > > > yeah, I think we need to support it. > > > > I've already made vfio-mdev-pci driver work for non-singleton iommu > > group. e.g. for devices in a single iommu group, I can bind the devices > > to eithervfio-pci or vfio-mdev-pci and then passthru them to a VM. And > > it will fail if user tries to passthru a vfio-mdev-pci device via vfio-pci > > manner "-device vfio-pci,host=01:00.1". In other words, vfio-mdev-pci > > device can only passthru via > > "-device vfio-pci,sysfsdev=/sys/bus/mdev/devices/UUID". This is what > > we expect. > > > > However, I encountered a problem when trying to prevent user from > > passthru these devices to different VMs. I've tried in my side, and I > > can passthru vfio-pci device and vfio-mdev-pci device to different > > VMs. But actually this operation should be failed. If all the devices > > are bound to vfio-pci, Qemu will open iommu backed group. So > > Qemu can check if a given group has already been used by an > > AddressSpace (a.ka. VM) in vfio_get_group() thus to prevent > > user from passthru these devices to different VMs if the devices > > are in the same iommu backed group. However, here for a > > vfio-mdev-pci device, it has a new group and group ID, Qemu > > will not be able to detect if the other devices (share iommu group > > with vfio-mdev-pci device) are passthru to existing VMs. This is the > > major problem for vfio-mdev-pci to support non-singleton group > > in my side now. Even all devices are bound to vfio-mdev-pci driver, > > Qemu is still unable to check since all the vfio-mdev-pci devices > > have a separate mdev group. > > > > To fix it, may need Qemu to do more things. E.g. If it tries to use a > > non-singleton iommu backed group, it needs to check if any mdev > > group is created and used by an existing VM. Also it needs check if > > iommu backed group is passthru to an existing VM when trying to > > use a mdev group. For singleton iommu backed group and > > aux-domain enabled physical device, still allow to passthru mdev > > group to different VMs. To achieve these checks, Qemu may need > > to have knowledge whether a group is iommu backed and singleton > > or not. Do you think it is good to expose such info to userspace? or > > any other idea? :-) > > QEMU is never responsible for isolating a group, QEMU is just a > userspace driver, it's vfio's responsibility to prevent the user from > splitting groups in ways that are not allowed. QEMU does not know the yep, also my concern. > true group association, it only knows the "virtual" group of the mdev > device. QEMU will create a container and add the mdev virtual group to > the container. In the kernel, the type1 backend should actually do an > iommu_attach_group(), attaching the iommu_device group to the domain. > When QEMU processes the next device, it will have a different group, > but (assuming no vIOMMU) it will try to attach it to the same > container, which should work because the iommu_device group backing the > mdev virtual group is already attached to this domain. > If we had two separate QEMU processes, each with an mdev device from a > common group at the iommu_device level, the type1 backend should fail > to attach the group to the container for the later caller. I'd think > this should fail at the iommu_attach_group() call since the group we're > trying to attach is already attached to another domain. Agree with you. At first, I want to fail it in similar way with vfio-pci devices. For vfio-pci devices from a common group, vfio will fail the operation around /dev/vfio/group_id open if user tries to assign the vfio-pci devices from common group to multiple QEMU processes. Meanwhile, QEMU will avoid to open a /dev/vfio/group_id multiple times, so current vfio/QEMU works well for non- singleton group (no vIOMMU). Unfortunately, looks like we have no way to fail vfio-mdev-pci devices in similar mechanism as each mdev has a separate group. So yes, I agree with you that we may fail it around the group attach phase. Below is my draft idea: In vfio_iommu_type1_attach_group(), we need to do the following checks. if (mdev_group) { if (iommu_device group enabled aux-domain) { /* * iommu_group enabled aux-domain means the iommu_devices * in this group are aux-domain enabled. e.g. SIOV capable devices. * Also, I think for aux-domain enabled group, it essentially means * the group is a singleton group as SIOV capable devices require * to be in a singleton group. */ iommu_aux_attach_device(); } else { /* * needs to check the group->opened in vfio.c. Just like what * vfio_group_fopen() does. May be a new VFIO interface required * here since the group->opened is within vfio.c. * vfio_iommu_device_group_opened_inc() will inc group->opened, so * that other VM will fail when trying to open the group. And another * VFIO interface is also required to dec group->opened when VM is * down. */ if (vfio_iommu_device_group_opened_inc(iommu_device_group)) return -EBUSY; iommu_attach_gorup(iommu_device_group); } } The concern here is the two new VFIO interfaces. Any thoughts on this proposal? :-) > It's really unfortunate that we don't have the mdev inheriting the > iommu group of the iommu_device so that userspace can really understand > this relationship. A separate group makes sense for the aux-domain > case, and is (I guess) not a significant issue in the case of a > singleton iommu_device group, but it's pretty awkward here. Perhaps > this is something we should correct in design of iommu backed mdevs. Yeah, for aux-domain case, it is not significant issue as aux-domain essentially means singleton iommu_devie group. And in early time, when designing the support for wrap pci as a mdev, we also considered to let vfio-mdev-pci to reuse iommu_device group. But this results in an iommu backed group includes mdev and physical devices, which might also be strange. Do you think it is valuable to reconsider it? > Thanks, > > Alex Thanks, Yi Liu