On Mon, Jan 16, 2023 at 7:16 PM Hillf Danton <hdanton@xxxxxxxx> wrote: > > On Mon, 16 Jan 2023 15:08:48 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx> > > On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@xxxxxxxx> wrote: > > > On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx> > > > > static inline bool vma_read_trylock(struct vm_area_struct *vma) > > > > { > > > > /* Check before locking. A race might cause false locked result. */ > > > > - if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) > > > > + if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) > > > > return false; > > > > > > Add mb to pair with the above wmb like > > > > The wmb above is to ensure the ordering between updates of lock_seq > > and vm_lock->count (lock_seq is updated first and vm_lock->count only > > after that). The first access to vm_lock->count in this function is > > atomic_inc_unless_negative() and it's an atomic RMW operation with a > > return value. According to documentation such functions are fully > > ordered, therefore I think we already have an implicit full memory > > barrier between reads of lock_seq and vm_lock->count here. Am I wrong? > > No you are not. I'm not wrong or the other way around? Please expand a bit. > Revisit it in case of full mb not ensured. > > #ifndef arch_atomic_inc_unless_negative > static __always_inline bool > arch_atomic_inc_unless_negative(atomic_t *v) > { > int c = arch_atomic_read(v); > > do { > if (unlikely(c < 0)) > return false; > } while (!arch_atomic_try_cmpxchg(v, &c, c + 1)); > > return true; > } > #define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative > #endif I think your point is that the full mb is not ensured here, specifically if c<0? > > > > int count = atomic_read(&vma->vm_lock->count); > > > for (;;) { > > > int new = count + 1; > > > > > > if (count < 0 || new < 0) > > > return false; > > > > > > new = atomic_cmpxchg(&vma->vm_lock->count, count, new); > > > if (new == count) > > > break; > > > count = new; > > > cpu_relax(); > > > } > > > > > > (wake up waiting readers after taking the lock; > > > but the write lock owner before this read trylock should be > > > responsible for waking waiters up.) > > > > > > lock acquire for read. > > > > This schema might cause readers to wait, which is not an exact > > Could you specify a bit more on wait? Oh, I misunderstood your intent with that for() loop. Indeed, if a writer took the lock, the count will be negative and it terminates with a failure. Yeah, I think that would work. > > > replacement for down_read_trylock(). The requirement to wake up > > waiting readers also complicates things > > If the writer lock owner is preempted by a reader while releasing lock, > > set count to zero > <-- preempt > wake up waiters > > then lock is owned by reader but with read waiters. > > That is buggy if write waiter starvation is allowed in this patchset. I don't quite understand your point here. Readers don't wait, so there can't be "read waiters". Could you please expand with a race diagram maybe? > > > and since we can always fall > > back to mmap_lock, that complication is unnecessary IMHO. I could use > > part of your suggestion like this: > > > > int new = count + 1; > > > > if (count < 0 || new < 0) > > return false; > > > > new = atomic_cmpxchg(&vma->vm_lock->count, count, new); > > if (new == count) > > return false; > > > > Compared to doing atomic_inc_unless_negative() first, like I did > > originally, this schema opens a bit wider window for the writer to get > > in the middle and cause the reader to fail locking but I don't think > > it would result in any visible regression. > > It is definitely fine for writer to acquire the lock while reader is > doing trylock, no? Yes. >