On Sun, Dec 22, 2024 at 7:03 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > On Thu, Dec 19, 2024 at 10:55 AM Liam R. Howlett > <Liam.Howlett@xxxxxxxxxx> wrote: > > > > * Peter Zijlstra <peterz@xxxxxxxxxxxxx> [241219 13:47]: > > > On Thu, Dec 19, 2024 at 01:18:23PM -0500, Liam R. Howlett wrote: > > > > > > > > For RCU lookups only the mas tree matters -- and its left present there. > > > > > > > > > > If you really want to block RCU readers, I would suggest punching a hole > > > > > in the mm_mt. All the traditional code won't notice anyway, this is all > > > > > with mmap_lock held for writing. > > > > > > > > We don't want to block all rcu readers, we want to block the rcu readers > > > > that would see the problem - that is, anyone trying to read a particular > > > > area. > > > > > > > > Right now we can page fault in unpopulated vmas while writing other vmas > > > > to the tree. We are also moving more users to rcu reading to use the > > > > vmas they need without waiting on writes to finish. > > > > > > > > Maybe I don't understand your suggestion, but I would think punching a > > > > hole would lose this advantage? > > > > > > My suggestion was to remove the range stuck in mas_detach from mm_mt. > > > That is exactly the affected range, no? > > > > Yes. > > > > But then looping over the vmas will show a gap where there should not be > > a gap. > > > > If we stop rcu readers entirely we lose the advantage. > > > > This is exactly the issue that the locking dance was working around :) > > IOW we write-lock the entire range before removing any part of it for > the whole transaction to be atomic, correct? > > > Peter, you suggested the following pattern for ensuring vma is > detached with no possible readers: > > vma_iter_store() > vma_start_write() > vma_mark_detached() > > What do you think about this alternative? > > vma_start_write() > ... > vma_iter_store() > vma_mark_detached() > vma_assert_write_locked(vma) > if (unlikely(!refcount_dec_and_test(&vma->vm_refcnt))) > vma_start_write() > > The second vma_start_write() is unlikely to be executed because the > vma is locked, vm_refcnt might be increased only temporarily by > readers before they realize the vma is locked and that's a very narrow > window. I think performance should not visibly suffer? > OTOH this would let us keep current locking patterns and would > guarantee that vma_mark_detached() always exits with a detached and > unused vma (less possibilities for someone not following an exact > pattern and ending up with a detached but still used vma). I posted v7 of this patchset at https://lore.kernel.org/all/20241226170710.1159679-1-surenb@xxxxxxxxxx/