On 11/10/2024 12:33, David Hildenbrand wrote: > On 11.10.24 13:29, Ryan Roberts wrote: >> On 11/10/2024 11:24, David Hildenbrand wrote: >>> We (or rather, readahead logic :) ) might be allocating a THP in the >>> pagecache and then try mapping it into a process that explicitly disabled >>> THP: we might end up installing PMD mappings. >>> >>> This is a problem for s390x KVM, which explicitly remaps all PMD-mapped >>> THPs to be PTE-mapped in s390_enable_sie()->thp_split_mm(), before >>> starting the VM. >>> >>> For example, starting a VM backed on a file system with large folios >>> supported makes the VM crash when the VM tries accessing such a mapping >>> using KVM. >>> >>> Is it also a problem when the HW disabled THP using >>> TRANSPARENT_HUGEPAGE_UNSUPPORTED? At least on x86 this would be the case >>> without X86_FEATURE_PSE. >>> >>> In the future, we might be able to do better on s390x and only disallow >>> PMD mappings -- what s390x and likely TRANSPARENT_HUGEPAGE_UNSUPPORTED >>> really wants. For now, fix it by essentially performing the same check as >>> would be done in __thp_vma_allowable_orders() or in shmem code, where this >>> works as expected, and disallow PMD mappings, making us fallback to PTE >>> mappings. >>> >>> Reported-by: Leo Fu <bfu@xxxxxxxxxx> >>> Fixes: 793917d997df ("mm/readahead: Add large folio readahead") >> >> Will this patch be difficult to backport given it depends on the previous patch >> and that doesn't have a Fixes tag? > > "difficult" -- not really. Andrew might want to tag patch #1 with "Fixes:" as > well, but I can also send simple stable backports that avoid patch #1. > > (Thinking again, I assume we want to Cc:stable) > >> >>> Cc: Thomas Huth <thuth@xxxxxxxxxx> >>> Cc: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> >>> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> >>> Cc: Christian Borntraeger <borntraeger@xxxxxxxxxxxxx> >>> Cc: Janosch Frank <frankja@xxxxxxxxxxxxx> >>> Cc: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> >>> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> >>> --- >>> mm/memory.c | 9 +++++++++ >>> 1 file changed, 9 insertions(+) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index 2366578015ad..a2e501489517 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -4925,6 +4925,15 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct >>> page *page) >>> pmd_t entry; >>> vm_fault_t ret = VM_FAULT_FALLBACK; >>> + /* >>> + * It is too late to allocate a small folio, we already have a large >>> + * folio in the pagecache: especially s390 KVM cannot tolerate any >>> + * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any >>> + * PMD mappings if THPs are disabled. >>> + */ >>> + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) >>> + return ret; >> >> Why not just call thp_vma_allowable_orders()? > > Why call thp_vma_allowable_orders() that does a lot more work that doesn't > really apply here? :) Yeah fair enough, I was just thinking it makes the code simpler to keep all the checks in one place. But no strong opinion. Either way: Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx> > > I'd say, just like shmem, we handle this separately here. >