On Thu, Nov 12, 2020 at 10:39:10PM -0800, John Hubbard wrote: > > IOWs, something like this: > > > > struct lpage { > > struct page subpages[4]; > > }; > > > > static inline struct lpage *page_lpage(struct page *page) > > { > > unsigned long head = READ_ONCE(page->compound_head); > > > > if (unlikely(head & 1)) > > return (struct lpage *)(head - 1); > > return (struct lpage *)page; > > } > > This is really a "get_head_page()" function, not a "get_large_page()" > function. But even renaming it doesn't seem quite right, because > wouldn't it be better to avoid discarding that tail bit information? In > other words, you might be looking at 3 cases, one of which is *not* > involving large pages at all: > > The page is a single, non-compound page. > The page is a head page of a compound page > The page is a tail page of a compound page > > ...but this function returns a type of "large page", even for the first > case. That's misleading, isn't it? Argh. Yes, that's part of the problem, so this is still confusing. An lpage might actually be an order-0 page. Maybe it needs to be called something that's not 'page' at all. There are really four cases: - An order-0 page - A subpage that happens to be a tail page - A subpage that happens to be a head page - An order-N page We have code today that treats tail pages as order-0 pages, but if the subpage you happen to pass in is a head page, it'll work on the entire page. That must, surely, be a bug. So what if we had: /* Cache memory */ struct cmem { struct page pages[1]; }; Now there's a clear hierarchy. The page cache stores pointers to cmem. struct page *cmem_page(struct cmem *cmem, pgoff_t index) { return cmem->pages[index - cmem->pages[0].index]; } struct cmem *page_cmem(struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return (struct cmem *)(head - 1); return (struct cmem *)page; } and we'll need the usualy panoply of functions to get the order/size/... of a cmem. We'll also need functions like CMemDirty(), CMemLocked(), CMemWriteback(), etc.