Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 10, 2024 at 2:39 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Nov 12, 2024 at 07:18:45AM -0800, Suren Baghdasaryan wrote:
> > On Mon, Nov 11, 2024 at 8:58 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> > >
> > > On Mon, Nov 11, 2024 at 12:55:05PM -0800, Suren Baghdasaryan wrote:
> > > > When a reader takes read lock, it increments the atomic, unless the
> > > > top two bits are set indicating a writer is present.
> > > > When writer takes write lock, it sets VMA_LOCK_WR_LOCKED bit if there
> > > > are no readers or VMA_LOCK_WR_WAIT bit if readers are holding the lock
> > > > and puts itself onto newly introduced mm.vma_writer_wait. Since all
> > > > writers take mmap_lock in write mode first, there can be only one writer
> > > > at a time. The last reader to release the lock will signal the writer
> > > > to wake up.
> > >
> > > I don't think you need two bits.  You can do it this way:
> > >
> > > 0x8000'0000 - No readers, no writers
> > > 0x1-7fff'ffff - Some number of readers
> > > 0x0 - Writer held
> > > 0x8000'0001-0xffff'ffff - Reader held, writer waiting
> > >
> > > A prospective writer subtracts 0x8000'0000.  If the result is 0, it got
> > > the lock, otherwise it sleeps until it is 0.
> > >
> > > A writer unlocks by adding 0x8000'0000 (not by setting the value to
> > > 0x8000'0000).
> > >
> > > A reader unlocks by adding 1.  If the result is 0, it wakes the writer.
> > >
> > > A prospective reader subtracts 1.  If the result is positive, it got the
> > > lock, otherwise it does the unlock above (this might be the one which
> > > wakes the writer).
> > >
> > > And ... that's it.  See how we use the CPU arithmetic flags to tell us
> > > everything we need to know without doing arithmetic separately?
> >
> > Yes, this is neat! You are using the fact that write-locked == no
> > readers to eliminate unnecessary state. I'll give that a try. Thanks!
>
> The reason I got here is that Vlastimil poked me about the whole
> TYPESAFE_BY_RCU thing.
>
> So the normal way those things work is with a refcount, if the refcount
> is non-zero, the identifying fields should be stable and you can
> determine if you have the right object, otherwise tough luck.
>
> And I was thinking that since you abuse this rwsem you have, you might
> as well turn that into a refcount with some extra.
>
> So I would propose a slightly different solution.
>
> Replace vm_lock with vm_refcnt. Replace vm_detached with vm_refcnt == 0
> -- that is, attach sets refcount to 1 to indicate it is part of the mas,
> detached is the final 'put'.

I need to double-check if we ever write-lock a detached vma. I don't
think we do but better be safe. If we do then that wait-until() should
accept 0x8000'0001 as well.

>
> RCU lookup does the inc_not_zero thing, when increment succeeds, compare
> mm/addr to validate.
>
> vma_start_write() already relies on mmap_lock being held for writing,
> and thus does not have to worry about writer-vs-writer contention, that
> is fully resolved by mmap_sem. This means we only need to wait for
> readers to drop out.
>
> vma_start_write()
>         add(0x8000'0001); // could fetch_add and double check the high
>                           // bit wasn't already set.
>         wait-until(refcnt == 0x8000'0002); // mas + writer ref
>         WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
>         sub(0x8000'0000);
>
> vma_end_write()
>         put();

We don't really have vma_end_write(). Instead it's vma_end_write_all()
which increments mm_lock_seq unlocking all write-locked VMAs.
Therefore in vma_start_write() I think we can sub(0x8000'0001) at the
end.

>
> vma_start_read() then becomes something like:
>
>         if (vm_lock_seq == mm_lock_seq)
>           return false;
>
>         cnt = fetch_inc(1);
>         if (cnt & msb || vm_lock_seq == mm_lock_seq) {
>           put();
>           return false;
>         }
>
>         return true;
>
> vma_end_read() then becomes:
>         put();
>
>
> and the down_read() from uffffffd requires mmap_read_lock() and thus
> does not have to worry about writers, it can simpy be inc() and put(),
> no?

I think your proposal should work. Let me try to code it and see if
something breaks.
Thanks Peter!





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux