Re: [RFC PATCH] docs/mm: add VMA locks documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 4, 2024 at 10:04 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
> On Mon, Nov 04, 2024 at 09:01:46AM -0800, Suren Baghdasaryan wrote:
> > On Fri, Nov 1, 2024 at 11:51 AM Lorenzo Stoakes
> > <lorenzo.stoakes@xxxxxxxxxx> wrote:
> > > +MM and VMA locks
> > > +----------------
> > > +
> > > +There are two key classes of lock utilised when reading and manipulating VMAs -
> > > +the `mmap_lock` which is a read/write semaphore maintained at the `mm_struct`
> > > +level of granularity and, if CONFIG_PER_VMA_LOCK is set, a per-VMA lock at the
> > > +VMA level of granularity.
> > > +
> > > +.. note::
> > > +
> > > +   Generally speaking, a read/write semaphore is a class of lock which permits
> > > +   concurrent readers. However a write lock can only be obtained once all
> > > +   readers have left the critical region (and pending readers made to wait).
> > > +
> > > +   This renders read locks on a read/write semaphore concurrent with other
> > > +   readers and write locks exclusive against all others holding the semaphore.
> > > +
> > > +If CONFIG_PER_VMA_LOCK is not set, then things are relatively simple - a write
> > > +mmap lock gives you exclusive write access to a VMA, and a read lock gives you
> > > +concurrent read-only access.
> > > +
> > > +In the presence of CONFIG_PER_VMA_LOCK, i.e. VMA locks, things are more
> > > +complicated. In this instance, a write semaphore is no longer enough to gain
> > > +exclusive access to a VMA, a VMA write lock is also required.
> >
> > I think "exclusive access to a VMA" should be "exclusive access to mm"
> > if you are talking about mmap_lock.
>
> Right, but in the past an mm write lock was sufficient to gain exclusive
> access to a _vma_. I will adjust to say 'write semaphore on the mm'.

We might want to introduce some explicit terminology for talking about
types of locks in MM at some point in this document. Like:

 - "high-level locks" (or "metadata locks"?) means mmap lock, VMA
lock, address_space lock, anon_vma lock

 - "pagetable-level locks" means page_table_lock and PMD/PTE spinlocks

 - "write-locked VMA" means mmap lock is held for writing and VMA has
been marked as write-lock

 - "rmap locks" means the address_space and anon_vma locks
   - "holding the rmap locks for writing" means holding both (if applicable)
   - "holding an rmap lock for reading" means holding one of them

 - "read-locked VMA" means either mmap lock held for reading or VMA
lock held for reading

That might make it a bit easier to write concise descriptions of
locking requirements in the rest of this document and keep them

> > > +The VMA lock is implemented via the use of both a read/write semaphore and
> > > +per-VMA and per-mm sequence numbers. We go into detail on this in the VMA lock
> > > +internals section below, so for the time being it is important only to note that
> > > +we can obtain either a VMA read or write lock.
> > > +
> > > +.. note::
> > > +
> > > +   VMAs under VMA **read** lock are obtained by the `lock_vma_under_rcu()`
> > > +   function, and **no** existing mmap or VMA lock must be held, This function
> >
> > "no existing mmap or VMA lock must be held" did you mean to say "no
> > exclusive mmap or VMA locks must be held"? Because one can certainly
> > hold a read-lock on them.
>
> Hmm really? You can hold an mmap read lock and obtain a VMA read lock too
> irrespective of that?

I think you can call lock_vma_under_rcu() while already holding the
mmap read lock, but only because lock_vma_under_rcu() has trylock
semantics. (The other way around leads to a deadlock: You can't take
the mmap read lock while holding a VMA read lock, because the VMA read
lock may prevent another task from write-locking a VMA after it has
already taken an mmap write lock.)

> > > +mmap write lock downgrading
> > > +---------------------------
> > > +
> > > +While it is possible to obtain an mmap write or read lock using the
> > > +`mm->mmap_lock` read/write semaphore, it is also possible to **downgrade** from
> > > +a write lock to a read lock via `mmap_write_downgrade()`.
> > > +
> > > +Similar to `mmap_write_unlock()`, this implicitly terminates all VMA write locks
> > > +via `vma_end_write_all()` (more or this behaviour in the VMA lock internals
> > > +section below), but importantly does not relinquish the mmap lock while
> > > +downgrading, therefore keeping the locked virtual address space stable.
> > > +
> > > +A subtlety here is that callers can assume, if they invoke an
> > > +mmap_write_downgrade() operation, that they still have exclusive access to the
> > > +virtual address space (excluding VMA read lock holders), as for another task to
> > > +have downgraded they would have had to have exclusive access to the semaphore
> > > +which can't be the case until the current task completes what it is doing.
> >
> > I can't decipher the above paragraph. Could you please dumb it down
> > for the likes of me?
>
> Since you're smarter than me this indicates I am not being clear here :)
> Actually reading this again I've not expressed this correctly.
>
> This is something Jann mentioned, that I hadn't thought of before.
>
> So if you have an mmap write lock, you have exclusive access to the mmap
> (with the usual caveats about racing vma locks unless you vma write lock).
>
> When you downgrade you now have a read lock - but because you were
> exclusive earlier in the function AND any new caller of the function will
> have to acquire that same write lock FIRST, they all have to wait on you
> and therefore you have exclusive access to the mmap only with a read map.
>
> So you are actually guaranteed that nobody else can be racing you _in that
> function_, and equally no other writers can arise until you're done as your
> holding the read lock prevents that.
>
> Jann - correct me if I'm wrong or missing something here.
>
> Will correct this unless Jann tells me I'm missing something on this :)

Yeah, basically you can hold an rwsem in three modes:

 - reader (R)
 - reader that results from downgrading a writer (D)
 - writer (W)

and this is the diagram of which excludes which (view it in monospace,
✔ means mutually exclusive):

  | R | D | W
==|===|===|===
R | ✘ | ✘ | ✔
--|---|---|---
D | ✘ | ✔ | ✔
--|---|---|---
W | ✔ | ✔ | ✔

So the special thing about downgraded-readers compared to normal
readers is that they exclude other downgraded-readers.





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux