Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock

Hillf Danton <hdanton@xxxxxxxx> · Tue, 17 Jan 2023 11:16:32 +0800

On Mon, 16 Jan 2023 15:08:48 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx>
> On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@xxxxxxxx> wrote:
> > On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > >  static inline bool vma_read_trylock(struct vm_area_struct *vma)
> > >  {
> > >       /* Check before locking. A race might cause false locked result. */
> > > -     if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > +     if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > >               return false;
> >
> > Add mb to pair with the above wmb like
> 
> The wmb above is to ensure the ordering between updates of lock_seq
> and vm_lock->count (lock_seq is updated first and vm_lock->count only
> after that). The first access to vm_lock->count in this function is
> atomic_inc_unless_negative() and it's an atomic RMW operation with a
> return value. According to documentation such functions are fully
> ordered, therefore I think we already have an implicit full memory
> barrier between reads of lock_seq and vm_lock->count here. Am I wrong?

No you are not. Revisit it in case of full mb not ensured.

#ifndef arch_atomic_inc_unless_negative
static __always_inline bool
arch_atomic_inc_unless_negative(atomic_t *v)
{
	int c = arch_atomic_read(v);

	do {
		if (unlikely(c < 0))
			return false;
	} while (!arch_atomic_try_cmpxchg(v, &c, c + 1));

	return true;
}
#define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
#endif

> >         int count = atomic_read(&vma->vm_lock->count);
> >         for (;;) {
> >                 int new = count + 1;
> >
> >                 if (count < 0 || new < 0)
> >                         return false;
> >
> >                 new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> >                 if (new == count)
> >                         break;
> >                 count = new;
> >                 cpu_relax();
> >         }
> >
> >         (wake up waiting readers after taking the lock;
> >         but the write lock owner before this read trylock should be
> >         responsible for waking waiters up.)
> >
> >         lock acquire for read.
> 
> This schema might cause readers to wait, which is not an exact

Could you specify a bit more on wait?

> replacement for down_read_trylock(). The requirement to wake up
> waiting readers also complicates things

If the writer lock owner is preempted by a reader while releasing lock,

	set count to zero
			  <-- preempt
	wake up waiters

then lock is owned by reader but with read waiters.

That is buggy if write waiter starvation is allowed in this patchset.

> and since we can always fall
> back to mmap_lock, that complication is unnecessary IMHO. I could use
> part of your suggestion like this:
> 
>                  int new = count + 1;
> 
>                  if (count < 0 || new < 0)
>                          return false;
> 
>                  new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
>                  if (new == count)
>                          return false;
> 
> Compared to doing atomic_inc_unless_negative() first, like I did
> originally, this schema opens a bit wider window for the writer to get
> in the middle and cause the reader to fail locking but I don't think
> it would result in any visible regression.

It is definitely fine for writer to acquire the lock while reader is
doing trylock, no?