Re: Folios for 5.15 request - Was: re: Folio discussion recap -

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Sat, 23 Oct 2021 03:22:35 +0100

On Fri, Oct 22, 2021 at 04:40:24PM +0200, David Hildenbrand wrote:
> On 22.10.21 15:01, Matthew Wilcox wrote:
> > On Fri, Oct 22, 2021 at 09:59:05AM +0200, David Hildenbrand wrote:
> >> something like this would roughly express what I've been mumbling about:
> >>
> >> anon_mem    file_mem
> >>    |            |
> >>    ------|------
> >>       lru_mem       slab
> >>          |           |
> >>          -------------
> >>                |
> >> 	      page
> >>
> >> I wouldn't include folios in this picture, because IMHO folios as of now
> >> are actually what we want to be "lru_mem", just which a much clearer
> >> name+description (again, IMHO).
> > 
> > I think folios are a superset of lru_mem.  To enhance your drawing:
> > 
> 
> In the picture below we want "folio" to be the abstraction of "mappable
> into user space", after reading your link below and reading your graph,
> correct? Like calling it "user_mem" instead.

Hmm.  Actually, we want a new layer in the ontology:

page
   folio
      mappable
         lru_mem
            anon_mem
            ksm
            file_mem
         netpool
         devmem
         zonedev
         vmalloc
      zsmalloc
      dmapool
      devmem (*)
   slab
   pgtable
   buddy

(*) yes, devmem appears twice; some is mappable and some is not

The ontology is kind of confusing because *every* page is part of a
folio.  Sometimes it's a folio of one page (eg vmalloc).  Which means
that it's legitimate to call page_folio() on a slab page and then call
folio_test_slab().  It's not the direction we want to go though.

We're also inconsistent about whether we consider an entire compound
page / folio the thing which is mapped, or whether each individual page
in the compound page / folio can be mapped.  See how differently file-THP
and anon-THP are handled in rmap, for example.  I think that was probably
a mistake.

> Because any of these types would imply that we're looking at the head
> page (if it's a compound page). And we could (or even already have?)
> have other types that cannot be mapped to user space that are actually a
> compound page.

Sure, slabs are compound pages which cannot be mapped to userspace.

> > I have a little list of memory types here:
> > https://kernelnewbies.org/MemoryTypes
> > 
> > Let me know if anything is missing.
> 
> hugetlbfs pages might deserve a dedicated type, right?

Not sure.  Aren't they just file pages (albeit sometimes treated
specially, which is one of the mistakes we need to fix)?

> > And starting with file_mem makes the supposition that it's worth splitting
> > file_mem from anon_mem.  I believe that's one or two steps further than
> > it's worth, but I can be convinced otherwise.  For example, do we have
> > examples of file pages being passed to routines that expect anon pages?
> 
> That would be a BUG, so I hope we don't have it ;)

Right.  I'm asking, did we fix any bugs in the last year or two that
were caused by this kind of mismatch and would be prevented by using
a different type?  There's about half a dozen bugs we've had in the
last year that were caused by passing tail pages to functions that
were expecting head pages.

I can think of one problem we have, which is that (for a few filesystems
which have opted into this), we can pass an anon page into ->readpage()
and we've had problems with those filesystems then mishandling the
anon page.  The solution to this problem is not to pass an lru_mem to
readpage, but to use a different fs operation to read swap pages.

> Let's consider folio_wait_writeback(struct folio *folio)
> 
> Do we actually want to pass in a folio here? Would we actually want to
> pass in lru_mem here or even something else?

Well, let's look at the callers (for simplicity, look at Linus'
current tree).  Other than the ones in filesystems which we can assume
have file pages, mm/migrate.c has __unmap_and_move().  What type should
migrate_pages() have and pass around?

> Looking at some core MM code, like mm/huge_memory.c, and seeing all the
> PageAnon() specializations, having a dedicated anon_mem type might be
> valuable. But at this point it's hard to tell if splitting up these
> functions would actually be desirable.

Yes.  That's my point; it *might* be desirable.  I have no objections to
it, but the people doing the work need to show the benefits.  I'm showing
the benefits to folios -- fewer bugs, smaller code, larger pages in the
page cache leading to faster systems.  I acknowledge the costs in terms
of churn.

You can see folios as a first step to disentangling some of the users
of struct page.  It certainly won't be the last step.  But I'd really
like to stop having theoretical discussions of memory types and get on
with writing code.  If that means we modify the fs APIs again in twelve
months to replace folios with file_mem, well, I'm OK with that.