On Tue, Jul 02, 2024 at 05:15:12PM -0700, Yang Shi wrote: > On 7/2/24 5:04 PM, Yang Shi wrote: > > On 7/2/24 5:34 AM, Catalin Marinas wrote: > > > Last time I checked, about a year ago, this was not sufficient. One > > > issue is that there's no arch_clear_hugetlb_flags() implemented by your > > > patch, leaving PG_arch_{2,3} set on a page. The other issue was that I > > > initially tried to do this only on the head page but this did not go > > > well with the folio_copy() -> copy_highpage() which expects the > > > PG_arch_* flags on each individual page. The alternative was for > > > arch_clear_hugetlb_flags() to iterate over all the pages in a folio. > > > > Thanks for pointing this out. I did miss this point. I took a quick look > > at when the PG_ flags are set. IIUC, it is set by post_alloc_hook() for > > order-0 anonymous folio (clearing page and tags) and set_ptes() for > > others (just clear tags), for example, THP and hugetlb. > > > > I can see THP does set the PG_mte_tagged flag for each sub pages. But it > > seems it does not do it for hugetlb if I read the code correctly. The > > call path is: > > > > hugetlb_fault() -> > > hugetlb_no_page-> > > set_huge_pte_at -> > > __set_ptes() -> > > __sync_cache_and_tags() -> > > > > > > The __set_ptes() is called in a loop: > > > > if (!pte_present(pte)) { > > for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) > > __set_ptes(mm, addr, ptep, pte, 1); > > return; > > } > > > > The ncontig and pgsize are returned by num_contig_ptes(). For example, > > 2M hugetlb, ncontig is 1 and pgsize is 2M IIUC. So it means actually > > just the head page has PG_mte_tagged set. If so the copy_highpage() will > > just copy the old head page's flag to the new head page, and the tag. > > All the sub pages don't have PG_mte_tagged set. > > > > Is it expected behavior? I'm supposed we need tags for every sub pages > > too, right? > > We should need something like the below to have tags and page flag set up > for each sub page: > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c > index 3f09ac73cce3..528164deef27 100644 > --- a/arch/arm64/mm/hugetlbpage.c > +++ b/arch/arm64/mm/hugetlbpage.c > @@ -228,9 +228,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned > long addr, > int ncontig; > unsigned long pfn, dpfn; > pgprot_t hugeprot; > + unsigned long nr = sz >> PAGE_SHIFT; > > ncontig = num_contig_ptes(sz, &pgsize); > > + __sync_cache_and_tags(pte, nr); > + > if (!pte_present(pte)) { > for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) > __set_ptes(mm, addr, ptep, pte, 1); We only need this for the last loop when we have a present pte. I'd also avoid calling __set_ptes(...., 1) if we call the __sync_cache_and_tags() here already. Basically just unroll __set_ptes() in set_huge_pte_at() and call __set_pte() directly. It might be better to convert those page flag checks to only happen on the head page. My stashed changes from over a year ago (before we had more folio conversions) below. However, as I mentioned, I got stuck on folio_copy() which also does a cond_resched() between copy_highpage(). ---------8<-------------------------------- diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index f5bcb0dc6267..a84ad0e46f12 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -49,7 +49,9 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte, return; if (try_page_mte_tagging(page)) { - mte_clear_page_tags(page_address(page)); + unsigned long i, nr_pages = compound_nr(page); + for (i = 0; i < nr_pages; i++) + mte_clear_page_tags(page_address(page + i)); set_page_mte_tagged(page); } } @@ -57,22 +59,23 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte, void mte_sync_tags(pte_t old_pte, pte_t pte) { struct page *page = pte_page(pte); - long i, nr_pages = compound_nr(page); - bool check_swap = nr_pages == 1; + bool check_swap = !PageCompound(page); bool pte_is_tagged = pte_tagged(pte); /* Early out if there's nothing to do */ if (!check_swap && !pte_is_tagged) return; + /* + * HugeTLB pages are always fully mapped, so only setting head page's + * PG_mte_* flags is enough. + */ + if (PageHuge(page)) + page = compound_head(page); + /* if PG_mte_tagged is set, tags have already been initialised */ - for (i = 0; i < nr_pages; i++, page++) { - if (!page_mte_tagged(page)) { - mte_sync_page_tags(page, old_pte, check_swap, - pte_is_tagged); - set_page_mte_tagged(page); - } - } + if (!page_mte_tagged(page)) + mte_sync_page_tags(page, old_pte, check_swap, pte_is_tagged); /* ensure the tags are visible before the PTE is set */ smp_wmb(); diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 5626ddb540ce..692198d8c0d2 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -1079,7 +1079,7 @@ long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm, /* uaccess failed, don't leave stale tags */ if (num_tags != MTE_GRANULES_PER_PAGE) - mte_clear_page_tags(page); + mte_clear_page_tags(page_address(page)); set_page_mte_tagged(page); kvm_release_pfn_dirty(pfn); diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 31d7fa4c7c14..b531b1d75cda 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1173,11 +1173,10 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn, if (!kvm_has_mte(kvm)) return; - for (i = 0; i < nr_pages; i++, page++) { - if (try_page_mte_tagging(page)) { - mte_clear_page_tags(page_address(page)); - set_page_mte_tagged(page); - } + if (try_page_mte_tagging(page)) { + for (i = 0; i < nr_pages; i++) + mte_clear_page_tags(page_address(page + i)); + set_page_mte_tagged(page); } } -- Catalin