On Thu, Dec 19, 2024 at 08:14:24AM -0800, Suren Baghdasaryan wrote: > On Thu, Dec 19, 2024 at 1:13 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > > On Wed, Dec 18, 2024 at 01:53:17PM -0800, Suren Baghdasaryan wrote: > > > > > Ah, ok I see now. I completely misunderstood what for_each_vma_range() > > > was doing. > > > > > > Then I think vma_start_write() should remain inside > > > vms_gather_munmap_vmas() and all vmas in mas_detach should be > > > > No, it must not. You really are not modifying anything yet (except the > > split, which we've already noted mark write themselves). > > > > > write-locked, even the ones we are not modifying. Otherwise what would > > > prevent the race I mentioned before? > > > > > > __mmap_region > > > __mmap_prepare > > > vms_gather_munmap_vmas // adds vmas to be unmapped into mas_detach, > > > // some locked > > > by __split_vma(), some not locked > > > > > > lock_vma_under_rcu() > > > vma = mas_walk // finds > > > unlocked vma also in mas_detach > > > vma_start_read(vma) // > > > succeeds since vma is not locked > > > // vma->detached, vm_start, > > > vm_end checks pass > > > // vma is successfully read-locked > > > > > > vms_clean_up_area(mas_detach) > > > vms_clear_ptes > > > // steps on a cleared PTE > > > > So here we have the added complexity that the vma is not unhooked at > > all. Is there anything that would prevent a concurrent gup_fast() from > > doing the same -- touch a cleared PTE? > > > > AFAICT two threads, one doing overlapping mmap() and the other doing > > gup_fast() can result in exactly this scenario. > > > > If we don't care about the GUP case, when I'm thinking we should not > > care about the lockless RCU case either. > > > > > __mmap_new_vma > > > vma_set_range // installs new vma in the range > > > __mmap_complete > > > vms_complete_munmap_vmas // vmas are write-locked and detached > > > but it's too late > > > > But at this point that old vma really is unhooked, and the > > vma_write_start() here will ensure readers are gone and it will clear > > PTEs *again*. > > So, to summarize, you want vma_start_write() and vma_mark_detached() > to be done when we are removing the vma from the tree, right? *after* > Something like: vma_iter_store() vma_start_write() vma_mark_detached() By having vma_start_write() after being unlinked you get the guarantee of no concurrency. New lookups cannot find you (because of that vma_iter_store()) and existing readers will be waited for. > And the race I described is not a real problem since the vma is still > in the tree, so gup_fast() does exactly that and will simply reinstall > the ptes. Just so.