On Mon, Aug 23, 2021 at 08:01:44PM +0100, Matthew Wilcox wrote: > Hi Linus, > > I'm sending this pull request a few days before the merge window > opens so you have time to think about it. I don't intend to make any > further changes to the branch, so I've created the tag and signed it. > It's been in Stephen's next tree for a few weeks with only minor problems > (now addressed). > > The point of all this churn is to allow filesystems and the page cache > to manage memory in larger chunks than PAGE_SIZE. The original plan was > to use compound pages like THP does, but I ran into problems with some > functions that take a struct page expect only a head page while others > expect the precise page containing a particular byte. > > This pull request converts just parts of the core MM and the page cache. > For 5.16, we intend to convert various filesystems (XFS and AFS are ready; > other filesystems may make it) and also convert more of the MM and page > cache to folios. For 5.17, multi-page folios should be ready. > > The multi-page folios offer some improvement to some workloads. The 80% > win is real, but appears to be an artificial benchmark (postgres startup, > which isn't a serious workload). Real workloads (eg building the kernel, > running postgres in a steady state, etc) seem to benefit between 0-10%. > I haven't heard of any performance losses as a result of this series. > Nobody has done any serious performance tuning; I imagine that tweaking > the readahead algorithm could provide some more interesting wins. > There are also other places where we could choose to create large folios > and currently do not, such as writes that are larger than PAGE_SIZE. > > I'd like to thank all my reviewers who've offered review/ack tags: > > Christoph Hellwig <hch@xxxxxx> > David Howells <dhowells@xxxxxxxxxx> > Jan Kara <jack@xxxxxxx> > Jeff Layton <jlayton@xxxxxxxxxx> > Johannes Weiner <hannes@xxxxxxxxxxx> Just to clarify, I'm only on this list because I acked 3 smaller, independent memcg cleanup patches in this series. I have repeatedly expressed strong reservations over folios themselves. The arguments for a better data interface between mm and filesystem in light of variable page sizes are plentiful and convincing. But from an MM point of view, it's all but clear where the delineation between the page and folio is, and what the endgame is supposed to look like. One one hand, the ambition appears to substitute folio for everything that could be a base page or a compound page even inside core MM code. Since there are very few places in the MM code that expressly deal with tail pages in the first place, this amounts to a conversion of most MM code - including the LRU management, reclaim, rmap, migrate, swap, page fault code etc. - away from "the page". However, this far exceeds the goal of a better mm-fs interface. And the value proposition of a full MM-internal conversion, including e.g. the less exposed anon page handling, is much more nebulous. It's been proposed to leave anon pages out, but IMO to keep that direction maintainable, the folio would have to be translated to a page quite early when entering MM code, rather than propagating it inward, in order to avoid huge, massively overlapping page and folio APIs. It's also not clear to me that using the same abstraction for compound pages and the file cache object is future proof. It's evident from scalability issues in the allocator, reclaim, compaction, etc. that with current memory sizes and IO devices, we're hitting the limits of efficiently managing memory in 4k base pages per default. It's also clear that we'll continue to have a need for 4k cache granularity for quite a few workloads that work with large numbers of small files. I'm not sure how this could be resolved other than divorcing the idea of a (larger) base page from the idea of cache entries that can correspond, if necessary, to memory chunks smaller than a default page. A longer thread on that can be found here: https://lore.kernel.org/linux-fsdevel/YFja%2FLRC1NI6quL6@xxxxxxxxxxx/ As an MM stakeholder, I don't think folios are the answer for MM code.