On Mon, Apr 15, 2024 at 11:15:03PM +0100, Matthew Wilcox wrote: > On Mon, Apr 15, 2024 at 03:05:44PM -0700, Vishal Moola wrote: > > Commit 9acad7ba3e25 ("hugetlb: use vmf_anon_prepare() instead of > > anon_vma_prepare()") may bailout after allocating a folio if we do not > > hold the mmap lock. When this occurs, vmf_anon_prepare() will release the > > vma lock. Hugetlb then attempts to call restore_reserve_on_error(), > > which depends on the vma lock being held. > > > > We can move vmf_anon_prepare() prior to the folio allocation in order to > > avoid calling restore_reserve_on_error() without the vma lock. > > But now you're calling vmf_anon_prepare() in the wrong place -- before > we've determined that we need an anon folio. So we'll create an > anon_vma even when we don't need one for this vma. > > This is definitely a pre-existing bug which you've exposed by making it > happen more easily. Needs a different fix though. I do not think this is a pre-existing bug. Prior to 'commit: 7c43a553792a ("hugetlb: allow faults to be handled under the VMA lock"), we would just bail out if we had FAULT_FLAG_VMA_LOCK. So there was no danger in calling functions that fiddle with vmas like restore_reserve_on_error() does. After that, we allow it but vmf_anon_prepare() releases the lock and returns VM_FAULT_RETRY if we really need to allocate an anon_vma. The problem is that now restore_reserve_on_error() will re-adjust the reservations without the vma lock, completely unsafe. I think the safest way to tackle this is just as Vishal did, call vmf_anon_prepare() upfront only for non VM_MAYSHARE faults. -- Oscar Salvador SUSE Labs