On Wed, Sep 09, 2020 at 05:29:04PM +0300, Kirill A. Shutemov wrote: > On Tue, Sep 08, 2020 at 08:55:29PM +0100, Matthew Wilcox (Oracle) wrote: > > A compound page in the page cache will not necessarily be of PMD size, > > so check explicitly. > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > --- > > mm/memory.c | 7 ++++--- > > 1 file changed, 4 insertions(+), 3 deletions(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 602f4283122f..4b35b4e71e64 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3562,13 +3562,14 @@ static vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) > > unsigned long haddr = vmf->address & HPAGE_PMD_MASK; > > pmd_t entry; > > int i; > > - vm_fault_t ret; > > + vm_fault_t ret = VM_FAULT_FALLBACK; > > > > if (!transhuge_vma_suitable(vma, haddr)) > > - return VM_FAULT_FALLBACK; > > + return ret; > > > > - ret = VM_FAULT_FALLBACK; > > page = compound_head(page); > > + if (page_order(page) != HPAGE_PMD_ORDER) > > + return ret; > > Maybe also VM_BUG_ON_PAGE(page_order(page) > HPAGE_PMD_ORDER, page)? > Just in case. In the patch where I actually start creating THPs, I limit the order to HPAGE_PMD_ORDER, so we're not going to see this today. At some point in the future, I can imagine that we allow THPs larger than PMD size, and what we'd want alloc_set_pte() to look like is: if (pud_none(*vmf->pud) && PageTransCompound(page)) { ret = do_set_pud(vmf, page); if (ret != VM_FAULT_FALLBACK) return ret; } if (pmd_none(*vmf->pmd) && PageTransCompound(page)) { ret = do_set_pmd(vmf, page); if (ret != VM_FAULT_FALLBACK) return ret; } Once we're in that situation, in do_set_pmd(), we'd want to figure out which sub-page of the >PMD-sized page to insert. But I don't want to write code for that now. So, what's the right approach if somebody does call alloc_set_pte() with a >PMD sized page? It's not exported, so the only two ways to get it called with a >PMD sized page is to (1) persuade filemap_map_pages() to call it, which means putting it in the page cache or (2) return it from vm_ops->fault. If someone actually does that (an interesting device driver, perhaps), I don't think hitting it with a BUG is the right response. I think it should actually be to map the right PMD-sized chunk of the page, but we don't even do that today -- we map the first PMD-sized chunk of the page. With this patch, we'll simply map the appropriate PAGE_SIZE chunk at the requested address. So this would be a bugfix for such a demented driver. At some point, it'd be nice to handle this with a PMD, but I don't want to write that code without a test-case. We could probably simulate it with the page cache THP code and be super-aggressive about creating order-10 pages ... but this is feeling more and more out of scope for this patch set, which today hit 99 patches.