On Tue, Mar 22, 2022 at 09:29:23AM -0600, Alex Williamson wrote: > I'm still picking my way through the series, but the later compat > interface doesn't mention this difference as an outstanding issue. > Doesn't this difference need to be accounted in how libvirt manages VM > resource limits? AFACIT, no, but it should be checked. > AIUI libvirt uses some form of prlimit(2) to set process locked > memory limits. Yes, and ulimit does work fully. prlimit adjusts the value: int do_prlimit(struct task_struct *tsk, unsigned int resource, struct rlimit *new_rlim, struct rlimit *old_rlim) { rlim = tsk->signal->rlim + resource; [..] if (new_rlim) *rlim = *new_rlim; Which vfio reads back here: drivers/vfio/vfio_iommu_type1.c: unsigned long pfn, limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; drivers/vfio/vfio_iommu_type1.c: unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; And iommufd does the same read back: lock_limit = task_rlimit(pages->source_task, RLIMIT_MEMLOCK) >> PAGE_SHIFT; npages = pages->npinned - pages->last_npinned; do { cur_pages = atomic_long_read(&pages->source_user->locked_vm); new_pages = cur_pages + npages; if (new_pages > lock_limit) return -ENOMEM; } while (atomic_long_cmpxchg(&pages->source_user->locked_vm, cur_pages, new_pages) != cur_pages); So it does work essentially the same. The difference is more subtle, iouring/etc puts the charge in the user so it is additive with things like iouring and additively spans all the users processes. However vfio is accounting only per-process and only for itself - no other subsystem uses locked as the charge variable for DMA pins. The user visible difference will be that a limit X that worked with VFIO may start to fail after a kernel upgrade as the charge accounting is now cross user and additive with things like iommufd. This whole area is a bit peculiar (eg mlock itself works differently), IMHO, but with most of the places doing pins voting to use user->locked_vm as the charge it seems the right path in today's kernel. Ceratinly having qemu concurrently using three different subsystems (vfio, rdma, iouring) issuing FOLL_LONGTERM and all accounting for RLIMIT_MEMLOCK differently cannot be sane or correct. I plan to fix RDMA like this as well so at least we can have consistency within qemu. Thanks, Jason