On Sat, Mar 13, 2021 at 12:37:02PM -0800, Andrew Morton wrote: > On Fri, 5 Mar 2021 04:18:37 +0000 "Matthew Wilcox (Oracle)" <willy@xxxxxxxxxxxxx> wrote: > > > A struct folio refers to an entire (possibly compound) page. A function > > which takes a struct folio argument declares that it will operate on the > > entire compound page, not just PAGE_SIZE bytes. In return, the caller > > guarantees that the pointer it is passing does not point to a tail page. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > --- > > include/linux/mm.h | 30 ++++++++++++++++++++++++++++++ > > include/linux/mm_types.h | 17 +++++++++++++++++ > > Perhaps a new folio.h would be neater. I understand that urge (I'm no fan of the size of mm.h ...), but it ends up pretty interwoven with mm.h. For example: static inline struct zone *folio_zone(const struct folio *folio) { return page_zone(&folio->page); } so we need both struct folio defined here and we need page_zone(). page_zone() is defined in mm.h, so we'd have folio.h including mm.h. But then put_page() calls put_folio(), so we need folio.h included in mm.h. The patch series I have does a lot of movement of page cache functionality from mm.h to filemap.h, so you may end up with a smaller mm.h at the end of it. Right now, it's 10 lines longer than it was. > > +static inline struct folio *next_folio(struct folio *folio) > > +{ > > +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > > + return (struct folio *)nth_page(&folio->page, folio_nr_pages(folio)); > > +#else > > + return folio + folio_nr_pages(folio); > > +#endif > > +} > > It's a shame this isn't called folio_something(), like the rest of the API. > > Unclear what this does. Some comments would help. folio_next() it can be. I'll add some commentary. > > +static inline unsigned int folio_shift(struct folio *folio) > > +{ > > + return PAGE_SHIFT + folio_order(folio); > > +} > > + > > +static inline size_t folio_size(struct folio *folio) > > +{ > > + return PAGE_SIZE << folio_order(folio); > > +} > > Why size_t? That's pretty rare in this space and I'd have expected > unsigned long. I like to use size_t for things which are the number of bytes represented by an in-memory object. As opposed to all the other things which we use unsigned long for. Maybe that's more common on the filesystem side of the house. > > +static inline struct folio *page_folio(struct page *page) > > +{ > > + unsigned long head = READ_ONCE(page->compound_head); > > + > > + if (unlikely(head & 1)) > > + return (struct folio *)(head - 1); > > + return (struct folio *)page; > > +} > > What purpose does the READ_ONCE() serve? Same purpose as it does in compound_head(): static inline struct page *compound_head(struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return (struct page *) (head - 1); return page; } I think Kirill would say that it's to defend against splitting if we don't have a refcount on the page yet. So if we do something like walk the page tables, find a PTE, translate it to a struct page, then try to get a reference on the head page, we don't end up with an incoherent answer from 'compound_head()' if it's split in the middle of the call and the page->compound_head field gets assigned some other value. It might return the wrong page, so get_user_pages() still has to check the page is right after it's got the reference, but at least this way it's guaranteed to return something that was right at one time. There might be more to it, but that's my understanding of why the code is currently written this way.