On Tue, Jun 28, 2022 at 09:54:19AM -0400, Steven Sistare wrote: > >> As you and I have discussed, the count is also wrong in the direct > >> exec model, because exec clears mm->locked_vm. > > > > Really? Yikes, I thought exec would generate a new mm? > > Yes, exec creates a new mm with locked_vm = 0. The old locked_vm count is dropped > on the floor. The existing dma points to the same task, but task->mm has changed, > and dma->task->mm->locked_vm is 0. An unmap ioctl drives it > negative. Oh.. This is probably a bug, vfio should never use task->mm, the mm itself should be held using mmgrab instead. Otherwise exec case is broken as you describe. > I have prototyped a few possible fixes. One changes vfio to use user->locked_vm. > Another changes to mm->pinned_vm and preserves it during exec. A third preserves > mm->locked_vm across exec, but that is not practical, because mm->locked_vm mixes > vfio pins and mlocks. The mlock component must be cleared during exec, and we don't > have a separate count for it. Lossing locked_vm on exec/fork is the correct and expected behavior for the core kernel code, the bug is that vfio drives it negative. Jason