On Wed, Sep 21, 2022 at 12:06:49PM -0600, Alex Williamson wrote: > > I still think the compat gaps are small. I've realized that > > VFIO_DMA_UNMAP_FLAG_VADDR has no implementation in qemu, and since it > > can deadlock the kernel I propose we purge it completely. > > Steve won't be happy to hear that, QEMU support exists but isn't yet > merged. If Steve wants to keep it then someone needs to fix the deadlock in the vfio implementation before any userspace starts to appear. I can fix the deadlock in iommufd in a terrible expensive way, but would rather we design a better interface if nobody is using it yet. I advocate for passing the memfd to the kernel and use that as the page provider, not a mm_struct. > The issue is where we account these pinned pages, where accounting is > necessary such that a user cannot lock an arbitrary number of pages > into RAM to generate a DoS attack. It is worth pointing out that preventing a DOS attack doesn't actually work because a *task* limit is trivially bypassed by just spawning more tasks. So, as a security feature, this is already very questionable. What we've done here is make the security feature work to actually prevent DOS attacks, which then gives you this problem: > This obviously has implications. AFAICT, any management tool that > doesn't instantiate assigned device VMs under separate users are > essentially untenable. Because now that the security feature works properly it detects the DOS created by spawning multiple tasks :( Somehow I was under the impression there was not user sharing in the common cases, but I guess I don't know that for sure. > > So, I still like 2 because it yields the smallest next step before we > > can bring all the parallel work onto the list, and it makes testing > > and converting non-qemu stuff easier even going forward. > > If a vfio compatible interface isn't transparently compatible, then I > have a hard time understanding its value. Please correct my above > description and implications, but I suspect these are not just > theoretical ABI compat issues. Thanks, Because it is just fine for everything that doesn't use the ulimit feature, which is still a lot of use cases! Remember, at this point we are not replacing /dev/vfio/vfio, this is just providing the general compat in a form that has to be opted into. I think if you open the /dev/iommu device node then you should get secured accounting. If /dev/vfio/vfio is provided by iommufd it may well have to trigger a different ulimit tracking - if that is the only sticking point it seems minor and should be addressed in some later series that adds /dev/vfio/vfio support to iommufd.. Jason