* Jann Horn <jannh@xxxxxxxxxx> [230117 16:04]: > On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > Introduce lock_vma_under_rcu function to lookup and lock a VMA during > > page fault handling. When VMA is not found, can't be locked or changes > > after being locked, the function returns NULL. The lookup is performed > > under RCU protection to prevent the found VMA from being destroyed before > > the VMA lock is acquired. VMA lock statistics are updated according to > > the results. > > For now only anonymous VMAs can be searched this way. In other cases the > > function returns NULL. > [...] > > +struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, > > + unsigned long address) > > +{ > > + MA_STATE(mas, &mm->mm_mt, address, address); > > + struct vm_area_struct *vma, *validate; > > + > > + rcu_read_lock(); > > + vma = mas_walk(&mas); > > +retry: > > + if (!vma) > > + goto inval; > > + > > + /* Only anonymous vmas are supported for now */ > > + if (!vma_is_anonymous(vma)) > > + goto inval; > > + > > + if (!vma_read_trylock(vma)) > > + goto inval; > > + > > + /* Check since vm_start/vm_end might change before we lock the VMA */ > > + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) { > > + vma_read_unlock(vma); > > + goto inval; > > + } > > + > > + /* Check if the VMA got isolated after we found it */ > > + mas.index = address; > > + validate = mas_walk(&mas); > > Question for Maple Tree experts: > > Are you allowed to use mas_walk() like this? If the first mas_walk() > call encountered a single-entry tree, it would store mas->node = > MAS_ROOT, right? And then the second call would go into > mas_state_walk(), mas_start() would return NULL, mas_is_ptr() would be > true, and then mas_state_walk() would return the result of > mas_start(), which is NULL? And we'd end up with mas_walk() returning > NULL on the second run even though the tree hasn't changed? This is safe for VMAs. There might be a bug in the tree regarding re-walking with a pointer, but it won't matter here. A single entry tree will be a pointer if the entry is of the range 0 - 0 (mas.index == 0, mas.last == 0). This would be a zero sized VMA - which is not valid. The second walk will check if the maple node is dead and restart the walk if it is dead. If the node isn't dead (almost always the case), then it will be a very quick walk. After a mas_walk(), the maple state has mas.index = vma->vm_start and mas.last = (vma->vm_end - 1). The address is set prior to the second walk in case of a vma split where mas.index from the first walk is on the other side of the split than address. > > > + if (validate != vma) { > > + vma_read_unlock(vma); > > + count_vm_vma_lock_event(VMA_LOCK_MISS); > > + /* The area was replaced with another one. */ > > + vma = validate; > > + goto retry; > > + } > > + > > + rcu_read_unlock(); > > + return vma; > > +inval: > > + rcu_read_unlock(); > > + count_vm_vma_lock_event(VMA_LOCK_ABORT); > > + return NULL; > > +}