On Tue, Jan 17, 2023 at 10:28 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > On Tue, Jan 17, 2023 at 10:03 AM Jann Horn <jannh@xxxxxxxxxx> wrote: > > > > +locking maintainers > > Thanks! I'll CC the locking maintainers in the next posting. > > > > > On Mon, Jan 9, 2023 at 9:54 PM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > Introduce a per-VMA rw_semaphore to be used during page fault handling > > > instead of mmap_lock. Because there are cases when multiple VMAs need > > > to be exclusively locked during VMA tree modifications, instead of the > > > usual lock/unlock patter we mark a VMA as locked by taking per-VMA lock > > > exclusively and setting vma->lock_seq to the current mm->lock_seq. When > > > mmap_write_lock holder is done with all modifications and drops mmap_lock, > > > it will increment mm->lock_seq, effectively unlocking all VMAs marked as > > > locked. > > [...] > > > +static inline void vma_read_unlock(struct vm_area_struct *vma) > > > +{ > > > + up_read(&vma->lock); > > > +} > > > > One thing that might be gnarly here is that I think you might not be > > allowed to use up_read() to fully release ownership of an object - > > from what I remember, I think that up_read() (unlike something like > > spin_unlock()) can access the lock object after it's already been > > acquired by someone else. So if you want to protect against concurrent > > deletion, this might have to be something like: > > > > rcu_read_lock(); /* keeps vma alive */ > > up_read(&vma->lock); > > rcu_read_unlock(); > > But for deleting VMA one would need to write-lock the vma->lock first, > which I assume can't happen until this up_read() is complete. Is that > assumption wrong? __up_read() does: rwsem_clear_reader_owned(sem); tmp = atomic_long_add_return_release(-RWSEM_READER_BIAS, &sem->count); DEBUG_RWSEMS_WARN_ON(tmp < 0, sem); if (unlikely((tmp & (RWSEM_LOCK_MASK|RWSEM_FLAG_WAITERS)) == RWSEM_FLAG_WAITERS)) { clear_nonspinnable(sem); rwsem_wake(sem); } The atomic_long_add_return_release() is the point where we are doing the main lock-releasing. So if a reader dropped the read-lock while someone else was waiting on the lock (RWSEM_FLAG_WAITERS) and no other readers were holding the lock together with it, the reader also does clear_nonspinnable() and rwsem_wake() afterwards. But in rwsem_down_write_slowpath(), after we've set RWSEM_FLAG_WAITERS, we can return successfully immediately once rwsem_try_write_lock() sees that there are no active readers or writers anymore (if RWSEM_LOCK_MASK is unset and the cmpxchg succeeds). We're not necessarily waiting for the "nonspinnable" bit or the wake. So yeah, I think down_write() can return successfully before up_read() is done with its memory accesses. (Spinlocks are different - the kernel relies on being able to drop references via spin_unlock() in some places.)