On s390x, we actually need a pte_mkyoung() instead of a mark_page_accessed() when doing a FOLL_TOUCH to clear the HW invalid bit in the pte and allow subsequent accesses via the MMU to succeed without triggering a pagefault. Otherwise, buffered I/O will loop forever because it will keep stumlbing over the set HW invalid bit, requiring a page fault, which is disabled. Reported-by: Andreas Gruenbacher <agruenba@xxxxxxxxxx> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> --- mm/gup.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index a9d4d724aef7..d6c65474ed72 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -592,10 +592,27 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, set_page_dirty(page); /* * pte_mkyoung() would be more correct here, but atomic care - * is needed to avoid losing the dirty bit: it is easier to use - * mark_page_accessed(). + * is needed for architectures that have a hw dirty bit, to + * avoid losing the dirty bit: it is easier to use + * mark_page_accessed() for these architectures. + * + * s390x doesn't have a hw reference/dirty bit and sets the + * hw invalid bit in pte_mkold(), to catch further references. + * We have to update the pte via pte_mkyoung() here to clear the + * invalid bit and mark the page young; otherwise, callers that + * rely on not requiring a MMU fault once GUP(FOLL_TOUCH) + * succeeded will loop forever because the page won't be + * actually accessible via the MMU. */ - mark_page_accessed(page); + if (IS_ENABLED(CONFIG_S390)) { + pte = pte_mkyoung(pte); + if (!pte_same(pte, *ptep)) { + set_pte_at(vma->vm_mm, address, ptep, pte); + update_mmu_cache(vma, address, ptep); + } + } else { + mark_page_accessed(page); + } } if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { /* Do not mlock pte-mapped THP */ -- 2.35.1 We should probably generalize this, using an ARCH config that says that we don't have HW dirty bits and can do a pte_mkyoung() here without losing any concurrent updates to the pte via the hw. Further, I wonder if we might have to do a pte_mkdirty() in case of FOLL_WRITE for these architectures as well, instead of going via the set_page_dirty(). Could be that that might be required as well here, haven't looked into the details. The follow_trans_huge_pmd()->touch_pmd() case should be fine I guess, and it does both, the pmd_mkyoung and the pmd_mkdirty. -- Thanks, David / dhildenb