On Tue, Jan 17, 2023 at 12:34 AM Hillf Danton <hdanton@xxxxxxxx> wrote: > > On Mon, 16 Jan 2023 20:52:45 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx> > > On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > No you are not. > > > > I'm not wrong or the other way around? Please expand a bit. > > You are not wrong. Ok, I think if I rewrite the vma_read_trylock() we should be fine?: static inline bool vma_read_trylock(struct vm_area_struct *vma) { int count, new; /* Check before locking. A race might cause false locked result. */ if (READ_ONCE(vma->vm_lock->lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; count = atomic_read(&vma->vm_lock->count); for (;;) { /* * Is VMA is write-locked? Overflow might produce false locked result. * False unlocked result is impossible because we modify and check * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. */ if (count < 0) return false; new = count + 1; /* If atomic_t overflows, fail to lock. */ if (new < 0) return false; /* * Atomic RMW will provide implicit mb on success to pair with smp_wmb in * vma_write_lock, on failure we retry. */ new = atomic_cmpxchg(&vma->vm_lock->count, count, new); if (new == count) break; count = new; cpu_relax(); } if (unlikely(READ_ONCE(vma->vm_lock->lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq))) { if (atomic_dec_and_test(&vma->vm_lock->count)) wake_up(&vma->vm_mm->vma_writer_wait); return false; } return true; } > > > > > > If the writer lock owner is preempted by a reader while releasing lock, > > > > > > set count to zero > > > <-- preempt > > > wake up waiters > > > > > > then lock is owned by reader but with read waiters. > > > > > > That is buggy if write waiter starvation is allowed in this patchset. > > > > I don't quite understand your point here. Readers don't wait, so there > > can't be "read waiters". Could you please expand with a race diagram > > maybe? > > cpu3 cpu2 > --- --- > taskA bond to cpu3 > down_write(&mm->mmap_lock); > vma_write_lock L > taskB fail to take L for read > taskC fail to take mmap_lock for write > taskD fail to take L for read > vma_write_unlock_mm(mm); > > preempted by taskE > taskE take L for read and > read waiters of L, taskB and taskD, > should be woken up > > up_write(&mm->mmap_lock); Readers never wait for vma lock, that's why we have only vma_read_trylock and no vma_read_lock. In your scenario taskB and taskD will fall back to taking mmap_lock for read after they failed vma_read_trylock. Once taskA does up_write(mmap_lock) they will be woken up since they are blocked on taking mmap_lock for read. >