On Wed, Jul 26, 2023 at 05:32:07PM +0100, Ryan Roberts wrote: > On 26/07/2023 17:19, Nathan Chancellor wrote: > > Hi Ryan, > > > > On Thu, Jul 20, 2023 at 12:29:55PM +0100, Ryan Roberts wrote: > >> This allows batching the rmap removal with folio_remove_rmap_range(), > >> which means we avoid spuriously adding a partially unmapped folio to the > >> deferred split queue in the common case, which reduces split queue lock > >> contention. > >> > >> Previously each page was removed from the rmap individually with > >> page_remove_rmap(). If the first page belonged to a large folio, this > >> would cause page_remove_rmap() to conclude that the folio was now > >> partially mapped and add the folio to the deferred split queue. But > >> subsequent calls would cause the folio to become fully unmapped, meaning > >> there is no value to adding it to the split queue. > >> > >> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > >> --- > >> mm/memory.c | 120 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > >> 1 file changed, 120 insertions(+) > >> > >> diff --git a/mm/memory.c b/mm/memory.c > >> index 01f39e8144ef..189b1cfd823d 100644 > >> --- a/mm/memory.c > >> +++ b/mm/memory.c > >> @@ -1391,6 +1391,94 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, > >> pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); > >> } > >> > >> +static inline unsigned long page_cont_mapped_vaddr(struct page *page, > >> + struct page *anchor, unsigned long anchor_vaddr) > >> +{ > >> + unsigned long offset; > >> + unsigned long vaddr; > >> + > >> + offset = (page_to_pfn(page) - page_to_pfn(anchor)) << PAGE_SHIFT; > >> + vaddr = anchor_vaddr + offset; > >> + > >> + if (anchor > page) { > >> + if (vaddr > anchor_vaddr) > >> + return 0; > >> + } else { > >> + if (vaddr < anchor_vaddr) > >> + return ULONG_MAX; > >> + } > >> + > >> + return vaddr; > >> +} > >> + > >> +static int folio_nr_pages_cont_mapped(struct folio *folio, > >> + struct page *page, pte_t *pte, > >> + unsigned long addr, unsigned long end) > >> +{ > >> + pte_t ptent; > >> + int floops; > >> + int i; > >> + unsigned long pfn; > >> + struct page *folio_end; > >> + > >> + if (!folio_test_large(folio)) > >> + return 1; > >> + > >> + folio_end = &folio->page + folio_nr_pages(folio); > >> + end = min(page_cont_mapped_vaddr(folio_end, page, addr), end); > >> + floops = (end - addr) >> PAGE_SHIFT; > >> + pfn = page_to_pfn(page); > >> + pfn++; > >> + pte++; > >> + > >> + for (i = 1; i < floops; i++) { > >> + ptent = ptep_get(pte); > >> + > >> + if (!pte_present(ptent) || pte_pfn(ptent) != pfn) > >> + break; > >> + > >> + pfn++; > >> + pte++; > >> + } > >> + > >> + return i; > >> +} > >> + > >> +static unsigned long try_zap_anon_pte_range(struct mmu_gather *tlb, > >> + struct vm_area_struct *vma, > >> + struct folio *folio, > >> + struct page *page, pte_t *pte, > >> + unsigned long addr, int nr_pages, > >> + struct zap_details *details) > >> +{ > >> + struct mm_struct *mm = tlb->mm; > >> + pte_t ptent; > >> + bool full; > >> + int i; > >> + > >> + for (i = 0; i < nr_pages;) { > >> + ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); > >> + tlb_remove_tlb_entry(tlb, pte, addr); > >> + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); > >> + full = __tlb_remove_page(tlb, page, 0); > >> + > >> + if (unlikely(page_mapcount(page) < 1)) > >> + print_bad_pte(vma, addr, ptent, page); > >> + > >> + i++; > >> + page++; > >> + pte++; > >> + addr += PAGE_SIZE; > >> + > >> + if (unlikely(full)) > >> + break; > >> + } > >> + > >> + folio_remove_rmap_range(folio, page - i, i, vma); > >> + > >> + return i; > >> +} > >> + > >> static unsigned long zap_pte_range(struct mmu_gather *tlb, > >> struct vm_area_struct *vma, pmd_t *pmd, > >> unsigned long addr, unsigned long end, > >> @@ -1428,6 +1516,38 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, > >> page = vm_normal_page(vma, addr, ptent); > >> if (unlikely(!should_zap_page(details, page))) > >> continue; > >> + > >> + /* > >> + * Batch zap large anonymous folio mappings. This allows > >> + * batching the rmap removal, which means we avoid > >> + * spuriously adding a partially unmapped folio to the > >> + * deferrred split queue in the common case, which > >> + * reduces split queue lock contention. > >> + */ > >> + if (page && PageAnon(page)) { > >> + struct folio *folio = page_folio(page); > >> + int nr_pages_req, nr_pages; > >> + > >> + nr_pages_req = folio_nr_pages_cont_mapped( > >> + folio, page, pte, addr, end); > >> + > >> + nr_pages = try_zap_anon_pte_range(tlb, vma, > >> + folio, page, pte, addr, > >> + nr_pages_req, details); > >> + > >> + rss[mm_counter(page)] -= nr_pages; > >> + nr_pages--; > >> + pte += nr_pages; > >> + addr += nr_pages << PAGE_SHIFT; > >> + > >> + if (unlikely(nr_pages < nr_pages_req)) { > >> + force_flush = 1; > >> + addr += PAGE_SIZE; > >> + break; > >> + } > >> + continue; > >> + } > >> + > >> ptent = ptep_get_and_clear_full(mm, addr, pte, > >> tlb->fullmm); > >> tlb_remove_tlb_entry(tlb, pte, addr); > >> -- > >> 2.25.1 > >> > > > > After this change in -next as commit 904d9713b3b0 ("mm: batch-zap large > > anonymous folio PTE mappings"), I see the following splats several times > > when booting Debian's s390x configuration (which I have mirrored at [1]) > > in QEMU (bisect log below): > > Thanks for the bug report and sorry for the inconvenience. I'm going to need a > little time to figure out a build environment for s390x and get it reproducing. > Hopefully I can come back to you tomorrow with a fix. No worries! For what it's worth, if you are not already aware of it, there are GCC toolchains on kernel.org, which is what I use in general and in this particular case: https://kernel.org/pub/tools/crosstool/ You can just download them to somewhere on your drive then use CROSS_COMPILE=.../bin/s390-linux-gnu-, rather than downloading a bunch of distribution packages. Cheers, Nathan For what it's worth, I just use the GCC toolchains that are on kernel.org: