On Thu, 22 Apr 2021 14:57:15 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > > The security rule for isolation is that once a device is attached to a > > > /dev/ioasid fd then all other devices in that security group must be > > > attached to the same ioasid FD or left unused. > > > > Sounds like a group... Note also that if those other devices are not > > isolated from the user's device, the user could manipulate "unused" > > devices via DMA. So even unused devices should be within the same > > IOMMU context... thus attaching groups to IOMMU domains. > > That is a very interesting point. So, say, in the classic PCI bus > world if I have a NIC and HD on my PCI bus and both are in the group, > I assign the NIC to a /dev/ioasid & VFIO then it is possible to use > the NIC to access the HD via DMA > > And here you want a more explicit statement that the HD is at risk by > using the NIC? If by "classic" you mean conventional PCI bus, then this is much worse than simply "at risk". The IOMMU cannot differentiate devices behind a PCIe-to-PCI bridge, so the moment you turn on the IOMMU context for the NIC, the address space for your HBA is pulled out from under it. In the vfio world, the NIC and HBA are grouped and managed together, the user cannot change the IOMMU context of a group unless all of the devices in the group are "viable", ie. they are released from any host drivers. > Honestly, I'm not sure the current group FD is actually showing that > very strongly - though I get the point it is modeled in the sysfs and > kind of implicit in the API - we evolved things in a way where most > actual applications are taking in a PCI BDF from the user, not a group > reference. So the actual security impact seems lost on the user. vfio users are extremely aware of grouping, they understand the model, if not always the reason for the grouping. You only need to look at r/VFIO to find various lsgroup scripts and kernel patches to manipulate grouping. The visibility to the user is valuable imo. > Along my sketch if we have: > > ioctl(vifo_device_fd, JOIN_IOASID_FD, ioasifd) > [..] > ioctl(vfio_device, ATTACH_IOASID, gpa_ioasid_id) == ENOPERM > > I would feel comfortable if the ATTACH_IOASID fails by default if all > devices in the group have not been joined to the same ioasidfd. And without a group representation to userspace, how would a user know to resolve that? > So in the NIC&HD example the application would need to do: > > ioasid_fd = open("/dev/ioasid"); > nic_device_fd = open("/dev/vfio/device0") > hd_device_fd = open("/dev/vfio/device1") > > ioctl(nic_device_fd, JOIN_IOASID_FD, ioasifd) > ioctl(hd_device_fd, JOIN_IOASID_FD, ioasifd) > [..] > ioctl(nice_device, ATTACH_IOASID, gpa_ioasid_id) == SUCCESS > > Now the security relation is forced by the kernel to be very explicit. But not discoverable to the user. > However to keep current semantics, I'd suggest a flag on > JOIN_IOASID_FD called "IOASID_IMPLICIT_GROUP" which has the effect of > allowing the ATTACH_IOASID to succeed without the user having to > explicitly join all the group devices. This is analogous to the world > we have today of opening the VFIO group FD but only instantiating one > device FD. > > In effect the ioasid FD becomes the group and the numbered IOASID's > inside the FD become the /dev/vfio/vfio objects - we don't end up with > fewer objects in the system, they just have different uAPI > presentations. > > I'd envision applications like DPDK that are BDF centric to use the > first API with some '--allow-insecure-vfio' flag to switch on the > IOASID_IMPLICIT_GROUP. Maybe good applications would also print: > "Danger Will Robinson these PCI BDFs [...] are also at risk" > When the switch is used by parsing the sysfs So the group still exist in sysfs, they just don't have vfio representations? An implicit grouping does what, automatically unbind the devices, so an admin gives a user access to the NIC but their HBA device disappears because they were implicitly linked? That's why vfio basis ownership on the group, if a user owns the group but the group is not viable because a device is still bound to another kernel driver, the use can't do anything. Implicitly snarfing up subtly affected devices is bad. > > > Thus /dev/ioasid also becomes the unit of security and the IOMMU > > > subsystem level becomes aware of and enforces the group security > > > rules. Userspace does not need to "see" the group > > > > What tools does userspace have to understand isolation of individual > > devices without groups? > > I think we can continue to show all of this group information in sysfs > files, it just doesn't require application code to open a group FD. > > This becomes relavent the more I think about it - elmininating the > group and container FD uAPI by directly creating the device FD also > sidesteps questions about how to model these objects in a /dev/ioasid > only world. We simply don't have them at all so the answer is pretty > easy. I'm not sold. Ideally each device would be fully isolated, then we could assume a 1:1 relation of group and device and collapse the model to work on devices. We don't live in that world and I see a benefit to making that explicit in the uapi, even if that group fd might seem superfluous at times. Thanks, Alex