Re: [RFC] Documentation for folio_lock() and friends

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Thu, 28 Apr 2022 18:28:37 +0100

On Tue, Apr 05, 2022 at 03:48:19PM +1000, NeilBrown wrote:
> On Tue, 05 Apr 2022, Matthew Wilcox wrote:
> > It's a shame to not have these functions documented.  I'm sure I've
> > missed a few things that would be useful to document here.
> > 
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index ab47579af434..47b7851f1b64 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -888,6 +888,18 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
> >  void unlock_page(struct page *page);
> >  void folio_unlock(struct folio *folio);
> >  
> > +/**
> > + * folio_trylock() - Attempt to lock a folio.
> > + * @folio: The folio to attempt to lock.
> > + *
> > + * Sometimes it is undesirable to wait for a folio to be unlocked (eg
> > + * when the locks are being taken in the wrong order, or if making
> > + * progress through a batch of folios is more important than processing
> > + * them in order).  Usually folio_lock() is the correct function to call.
> 
> Usually?
> I think a "see also" type reference to folio_lock() is useful, but I
> don't think "usually" is helpful.

That was supposed to stand in contrast to the "Sometimes".  How about
this:

 * folio_lock() is a sleeping function.  If sleeping is not the right
 * behaviour (eg when the locks are being taken in the wrong order,
 * or if making progress through a batch of folios is more important
 * than processing them in order) then you can use folio_trylock().
 * It is never appropriate to implement a spinlock using folio_trylock().

... if not, could you suggest some better wording?

> > +/**
> > + * folio_lock() - Lock this folio.
> > + * @folio: The folio to lock.
> > + *
> > + * The folio lock protects against many things, probably more than it
> > + * should.  It is primarily held while a folio is being read from storage,
> > + * either from its backing file or from swap.  It is also held while a
> > + * folio is being truncated from its address_space.
> > + *
> > + * Holding the lock usually prevents the contents of the folio from being
> > + * modified by other kernel users, although it does not prevent userspace
> > + * from modifying data if it's mapped.  The lock is not consistently held
> > + * while a folio is being modified by DMA.
> 
> I don't think this paragraph is helpful...  maybe if it listed which
> change *are* prevented by the lock, rather than a few which aren't?

I put that in because it's a common misconception ("holding the page
locked will prevent it from being modified").

> I think it is significant that the lock prevents removal from the page
> cache, and so ->mapping is only stable while the lock is held.  It might
> be worth adding something about that.

That's implied by the last sentence of the first paragraph, but I can
include that.  Actually, ->mapping is also stable if the page is mapped
and you hold the page table lock of a page table it's mapped in.  That's
not theoretical BTW, it's the conditions under which ->dirty_folio() is
called -- either the folio lock is held, OR the folio is mapped and the
PTL is held.  That prevents truncation because truncation unmaps pages
before clearing ->mapping, and it needs to take the PTL to unmap the page.

Hard to express that in a lockdep expression because the PTL isn't
passed to folio_mark_dirty().  That explanation probably doesn't
belong here, so how about ...

 * folio_lock() - Lock this folio.
 * @folio: The folio to lock.
 *
 * The folio lock protects against many things, probably more than it
 * should.  It is primarily held while a folio is being brought uptodate,
 * either from its backing file or from swap.  It is also held while a
 * folio is being truncated from its address_space, so holding the lock
 * is sufficient to keep folio->mapping stable.
 *
 * The folio lock is also held while write() is modifying the page to
 * provide POSIX atomicity guarantees (as long as the write does not
 * cross a page boundary).  Other modifications to the data in the folio
 * do not hold the folio lock and can race with writes, eg DMA and stores
 * to mapped pages.
 *
 * Context: May sleep.  If you need to acquire the locks of two or
 * more folios, they must be in order of ascending index, if they are
 * in the same address_space.  If they are in different address_spaces,
 * acquire the lock of the folio which belongs to the address_space which
 * has the lowest address in memory first.

Looking at the comment on folio_mark_dirty(), it's a bit unhelpful.
How about this:

  * folio_mark_dirty - Mark a folio as being modified.
  * @folio: The folio.
  *
- * For folios with a mapping this should be done with the folio lock held
- * for the benefit of asynchronous memory errors who prefer a consistent
- * dirty state. This rule can be broken in some special cases,
- * but should be better not to.
+ * The folio may not be truncated while this function is running.
+ * Holding the folio lock is sufficient to prevent truncation, but some
+ * callers cannot acquire a sleeping lock.  These callers instead hold
+ * the page table lock for a page table which contains at least one page
+ * in this folio.  Truncation will block on the page table lock as it
+ * unmaps pages before removing the folio from its mapping.
  *
  * Return: True if the folio was newly dirtied, false if it was already dirty.