On Wed, Dec 11, 2024 at 12:25 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Tue, Dec 10, 2024 at 03:37:50PM -0800, Suren Baghdasaryan wrote: > > > > Replace vm_lock with vm_refcnt. Replace vm_detached with vm_refcnt == 0 > > > -- that is, attach sets refcount to 1 to indicate it is part of the mas, > > > detached is the final 'put'. > > > > I need to double-check if we ever write-lock a detached vma. I don't > > think we do but better be safe. If we do then that wait-until() should > > accept 0x8000'0001 as well. > > vma_start_write() > __is_vma_write_locked() > mmap_assert_write_locked(vma->vm_mm); > > So this really should hold afaict. > > > > RCU lookup does the inc_not_zero thing, when increment succeeds, compare > > > mm/addr to validate. > > > > > > vma_start_write() already relies on mmap_lock being held for writing, > > > and thus does not have to worry about writer-vs-writer contention, that > > > is fully resolved by mmap_sem. This means we only need to wait for > > > readers to drop out. > > > > > > vma_start_write() > > > add(0x8000'0001); // could fetch_add and double check the high > > > // bit wasn't already set. > > > wait-until(refcnt == 0x8000'0002); // mas + writer ref > > > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > > > sub(0x8000'0000); > > > > > > vma_end_write() > > > put(); > > > > We don't really have vma_end_write(). Instead it's vma_end_write_all() > > which increments mm_lock_seq unlocking all write-locked VMAs. > > Therefore in vma_start_write() I think we can sub(0x8000'0001) at the > > end. > > Right, I know you don't, but you should :-), I've suggested adding this > before. I'll look into adding it. IIRC there was some issue with that but I can't recall... > > > > vma_start_read() then becomes something like: > > > > > > if (vm_lock_seq == mm_lock_seq) > > > return false; > > > > > > cnt = fetch_inc(1); > > > if (cnt & msb || vm_lock_seq == mm_lock_seq) { > > > put(); > > > return false; > > > } > > > > > > return true; > > > > > > vma_end_read() then becomes: > > > put(); > > > > > > > > > and the down_read() from uffffffd requires mmap_read_lock() and thus > > > does not have to worry about writers, it can simpy be inc() and put(), > > > no? > > > > I think your proposal should work. Let me try to code it and see if > > something breaks. > > Btw, for the wait-until() and put() you can use rcuwait; that is the > simplest wait form we have. It's suitable because we only ever have the > one waiter. Yes, Davidlohr mentioned that before. I'll do that. Thanks!