Re: Folio discussion recap

Johannes Weiner <hannes@xxxxxxxxxxx> · Tue, 21 Sep 2021 17:59:33 -0400

On Tue, Sep 21, 2021 at 09:38:54PM +0100, Matthew Wilcox wrote:
> On Tue, Sep 21, 2021 at 03:47:29PM -0400, Johannes Weiner wrote:
> > This discussion is now about whether folio are suitable for anon pages
> > as well. I'd like to reiterate that regardless of the outcome of this
> > discussion I think we should probably move ahead with the page cache
> > bits, since people are specifically blocked on those and there is no
> > dependency on the anon stuff, as the conversion is incremental.
> 
> So you withdraw your NAK for the 5.15 pull request which is now four
> weeks old and has utterly missed the merge window?

Once you drop the bits that convert shared anon and file
infrastructure, yes. Because we haven't discussed yet, nor agree on,
that folio are the way forward for anon pages.

> > and so the justification for replacing page with folio *below* those
> > entry points to address tailpage confusion becomes nil: there is no
> > confusion. Move the anon bits to anon_page and leave the shared bits
> > in page. That's 912 lines of swap_state.c we could mostly leave alone.
> 
> Your argument seems to be based on "minimising churn". Which is certainly
> a goal that one could have, but I think in this case is actually harmful.
> There are hundreds, maybe thousands, of functions throughout the kernel
> (certainly throughout filesystems) which assume that a struct page is
> PAGE_SIZE bytes.  Yes, every single one of them is buggy to assume that,
> but tracking them all down is a never-ending task as new ones will be
> added as fast as they can be removed.

What does that have to do with anon pages?

> > The same is true for the LRU code in swap.c. Conceptually, already no
> > tailpages *should* make it onto the LRU. Once the high-level page
> > instantiation functions - add_to_page_cache_lru, do_anonymous_page -
> > have type safety, you really do not need to worry about tail pages
> > deep in the LRU code. 1155 more lines of swap.c.
> 
> It's actually impossible in practice as well as conceptually.  The list
> LRU is in the union with compound_head, so you cannot put a tail page
> onto the LRU.  But yet we call compound_head() on every one of them
> multiple times because our current type system does not allow us to
> express "this is not a tail page".

No, because we haven't identified *who actually needs* these calls
and move them up and out of the low-level helpers.

It was a mistake to add them there, yes. But they were added recently
for rather few callers. And we've had people send patches already to
move them where they are actually needed.

Of course converting *absolutely everybody else* to not-tailpage
instead will also fix the problem... I just don't agree that this is
an appropriate response to the issue.

Asking again: who conceptually deals with tail pages in MM? LRU and
reclaim don't. The page cache doesn't. Compaction doesn't. Migration
doesn't. All these data structures and operations are structured
around headpages, because that's the logical unit they operate on. The
notable exception, of course, are the page tables because they map the
pfns of tail pages. But is that it?  Does it come down to page table
walkers encountering pte-mapped tailpages? And needing compound_head()
before calling mark_page_accessed() or set_page_dirty()?

We couldn't fix vm_normal_page() to handle this? And switch khugepaged
to a new vm_raw_page() or whatever?

It should be possible to answer this question as part of the case for
converting tens of thousands of lines of code to folio.