Re: [PATCH] hugetlbfs: add MTE support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 02, 2024 at 05:15:12PM -0700, Yang Shi wrote:
> On 7/2/24 5:04 PM, Yang Shi wrote:
> > On 7/2/24 5:34 AM, Catalin Marinas wrote:
> > > Last time I checked, about a year ago, this was not sufficient. One
> > > issue is that there's no arch_clear_hugetlb_flags() implemented by your
> > > patch, leaving PG_arch_{2,3} set on a page. The other issue was that I
> > > initially tried to do this only on the head page but this did not go
> > > well with the folio_copy() -> copy_highpage() which expects the
> > > PG_arch_* flags on each individual page. The alternative was for
> > > arch_clear_hugetlb_flags() to iterate over all the pages in a folio.
> > 
> > Thanks for pointing this out. I did miss this point. I took a quick look
> > at when the PG_ flags are set. IIUC, it is set by post_alloc_hook() for
> > order-0 anonymous folio (clearing page and tags) and set_ptes() for
> > others (just clear tags), for example, THP and hugetlb.
> > 
> > I can see THP does set the PG_mte_tagged flag for each sub pages. But it
> > seems it does not do it for hugetlb if I read the code correctly. The
> > call path is:
> > 
> > hugetlb_fault() ->
> >   hugetlb_no_page->
> >     set_huge_pte_at ->
> >       __set_ptes() ->
> >         __sync_cache_and_tags() ->
> > 
> > 
> > The __set_ptes() is called in a loop:
> > 
> > if (!pte_present(pte)) {
> >         for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
> >             __set_ptes(mm, addr, ptep, pte, 1);
> >         return;
> >     }
> > 
> > The ncontig and pgsize are returned by num_contig_ptes(). For example,
> > 2M hugetlb, ncontig is 1 and pgsize is 2M IIUC. So it means actually
> > just the head page has PG_mte_tagged set. If so the copy_highpage() will
> > just copy the old head page's flag to the new head page, and the tag.
> > All the sub pages don't have PG_mte_tagged set.
> > 
> > Is it expected behavior? I'm supposed we need tags for every sub pages
> > too, right?
> 
> We should need something like the below to have tags and page flag set up
> for each sub page:
> 
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 3f09ac73cce3..528164deef27 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -228,9 +228,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned
> long addr,
>         int ncontig;
>         unsigned long pfn, dpfn;
>         pgprot_t hugeprot;
> +       unsigned long nr = sz >> PAGE_SHIFT;
> 
>         ncontig = num_contig_ptes(sz, &pgsize);
> 
> +       __sync_cache_and_tags(pte, nr);
> +
>         if (!pte_present(pte)) {
>                 for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
>                         __set_ptes(mm, addr, ptep, pte, 1);

We only need this for the last loop when we have a present pte. I'd also
avoid calling __set_ptes(...., 1) if we call the __sync_cache_and_tags()
here already. Basically just unroll __set_ptes() in set_huge_pte_at()
and call __set_pte() directly.

It might be better to convert those page flag checks to only happen on
the head page. My stashed changes from over a year ago (before we had
more folio conversions) below. However, as I mentioned, I got stuck on
folio_copy() which also does a cond_resched() between copy_highpage().

---------8<--------------------------------
diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c
index f5bcb0dc6267..a84ad0e46f12 100644
--- a/arch/arm64/kernel/mte.c
+++ b/arch/arm64/kernel/mte.c
@@ -49,7 +49,9 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,
 		return;
 
 	if (try_page_mte_tagging(page)) {
-		mte_clear_page_tags(page_address(page));
+		unsigned long i, nr_pages = compound_nr(page);
+		for (i = 0; i < nr_pages; i++)
+			mte_clear_page_tags(page_address(page + i));
 		set_page_mte_tagged(page);
 	}
 }
@@ -57,22 +59,23 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,
 void mte_sync_tags(pte_t old_pte, pte_t pte)
 {
 	struct page *page = pte_page(pte);
-	long i, nr_pages = compound_nr(page);
-	bool check_swap = nr_pages == 1;
+	bool check_swap = !PageCompound(page);
 	bool pte_is_tagged = pte_tagged(pte);
 
 	/* Early out if there's nothing to do */
 	if (!check_swap && !pte_is_tagged)
 		return;
 
+	/*
+	 * HugeTLB pages are always fully mapped, so only setting head page's
+	 * PG_mte_* flags is enough.
+	 */
+	if (PageHuge(page))
+		page = compound_head(page);
+
 	/* if PG_mte_tagged is set, tags have already been initialised */
-	for (i = 0; i < nr_pages; i++, page++) {
-		if (!page_mte_tagged(page)) {
-			mte_sync_page_tags(page, old_pte, check_swap,
-					   pte_is_tagged);
-			set_page_mte_tagged(page);
-		}
-	}
+	if (!page_mte_tagged(page))
+		mte_sync_page_tags(page, old_pte, check_swap, pte_is_tagged);
 
 	/* ensure the tags are visible before the PTE is set */
 	smp_wmb();
diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
index 5626ddb540ce..692198d8c0d2 100644
--- a/arch/arm64/kvm/guest.c
+++ b/arch/arm64/kvm/guest.c
@@ -1079,7 +1079,7 @@ long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm,
 
 			/* uaccess failed, don't leave stale tags */
 			if (num_tags != MTE_GRANULES_PER_PAGE)
-				mte_clear_page_tags(page);
+				mte_clear_page_tags(page_address(page));
 			set_page_mte_tagged(page);
 
 			kvm_release_pfn_dirty(pfn);
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..b531b1d75cda 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1173,11 +1173,10 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,
 	if (!kvm_has_mte(kvm))
 		return;
 
-	for (i = 0; i < nr_pages; i++, page++) {
-		if (try_page_mte_tagging(page)) {
-			mte_clear_page_tags(page_address(page));
-			set_page_mte_tagged(page);
-		}
+	if (try_page_mte_tagging(page)) {
+		for (i = 0; i < nr_pages; i++)
+			mte_clear_page_tags(page_address(page + i));
+		set_page_mte_tagged(page);
 	}
 }
 

-- 
Catalin




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux