On Wed, Sep 21, 2022 at 03:44:24PM -0300, Jason Gunthorpe wrote: > If /dev/vfio/vfio is provided by iommufd it may well have to trigger a > different ulimit tracking - if that is the only sticking point it > seems minor and should be addressed in some later series that adds > /dev/vfio/vfio support to iommufd.. And I have come up with a nice idea for this that feels OK - Add a 'pin accounting compat' flag to struct iommufd_ctx (eg per FD) The flag is set to 1 if /dev/vfio/vfio was the cdev that opened the ctx An IOCTL issued by cap sysadmin can set the flag - If the flag is set we do not do pin accounting in the user. Instead we account for pins in the FD. The single FD cannot pass the rlimit. This nicely emulates the desired behavior from virtualization without creating all the problems with exec/fork/etc that per-task tracking has. Even in iommufd native mode a priviledged virtualization layer can use the ioctl to enter the old mode and pass the fd to qemu under a shared user. This should ease migration I guess. It can still be oversubscribed but it is now limited to the number of iommufd_ctx's *with devices* that the userspace can create. Since each device can be attached to only 1 iommufd this is a stronger limit than the task limit. 1 device given to the qemu will mean a perfect enforcement. (ignoring that a hostile qemu can still blow past the rlimit using concurrent rdma or io_uring) It is a small incremental step - does this suitably address the concern? Jason