On 7/9/24 10:42 AM, Yang Shi wrote:
On 7/4/24 6:44 AM, Catalin Marinas wrote:On Tue, Jul 02, 2024 at 05:15:12PM -0700, Yang Shi wrote:On 7/2/24 5:04 PM, Yang Shi wrote:We should need something like the below to have tags and page flag set upOn 7/2/24 5:34 AM, Catalin Marinas wrote:Thanks for pointing this out. I did miss this point. I took a quick look at when the PG_ flags are set. IIUC, it is set by post_alloc_hook() forLast time I checked, about a year ago, this was not sufficient. Oneissue is that there's no arch_clear_hugetlb_flags() implemented by your patch, leaving PG_arch_{2,3} set on a page. The other issue was that Iinitially tried to do this only on the head page but this did not go well with the folio_copy() -> copy_highpage() which expects the PG_arch_* flags on each individual page. The alternative was for arch_clear_hugetlb_flags() to iterate over all the pages in a folio.order-0 anonymous folio (clearing page and tags) and set_ptes() for others (just clear tags), for example, THP and hugetlb.I can see THP does set the PG_mte_tagged flag for each sub pages. But itseems it does not do it for hugetlb if I read the code correctly. The call path is: hugetlb_fault() -> hugetlb_no_page-> set_huge_pte_at -> __set_ptes() -> __sync_cache_and_tags() -> The __set_ptes() is called in a loop: if (!pte_present(pte)) { for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) __set_ptes(mm, addr, ptep, pte, 1); return; } The ncontig and pgsize are returned by num_contig_ptes(). For example, 2M hugetlb, ncontig is 1 and pgsize is 2M IIUC. So it means actuallyjust the head page has PG_mte_tagged set. If so the copy_highpage() willjust copy the old head page's flag to the new head page, and the tag. All the sub pages don't have PG_mte_tagged set. Is it expected behavior? I'm supposed we need tags for every sub pages too, right?for each sub page: diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 3f09ac73cce3..528164deef27 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c@@ -228,9 +228,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsignedlong addr, int ncontig; unsigned long pfn, dpfn; pgprot_t hugeprot; + unsigned long nr = sz >> PAGE_SHIFT; ncontig = num_contig_ptes(sz, &pgsize); + __sync_cache_and_tags(pte, nr); + if (!pte_present(pte)) { for (i = 0; i < ncontig; i++, ptep++, addr += pgsize) __set_ptes(mm, addr, ptep, pte, 1);We only need this for the last loop when we have a present pte. I'd also avoid calling __set_ptes(...., 1) if we call the __sync_cache_and_tags() here already. Basically just unroll __set_ptes() in set_huge_pte_at() and call __set_pte() directly. It might be better to convert those page flag checks to only happen on the head page. My stashed changes from over a year ago (before we had more folio conversions) below. However, as I mentioned, I got stuck on folio_copy() which also does a cond_resched() between copy_highpage().
Ping...Does this make sense and sound good? If is is good, I will try to implement it.
We can have the page flags set for head only for hugetlb page. For copy_highpage(), we should be able to do something like the below:if page_is_head && page_is_hugetlb && page_has_mte_tagged set page_mte_tagged flags copy tags for all sub pages else // <-- tail page or non-hugetlb page current copy_highpage implementationThe hugetlb folio can't go away under us since migration path should pin it so the status of folio is stable. The preemption caused by cond_resched() should be fine too due to the pin and the page table entry keeps being migration entry until migration is done, so every one should just see migration entry and wait for migration is done.The other concerned user of copy_highpage() is uprobe, but it also pins the page then doing copy and it is called with holding write mmap_lock.IIUC, it should work if I don't miss something. This also should have no impact on HVO. The overhead for other users of copy_highpage() should be also acceptable.---------8<-------------------------------- diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index f5bcb0dc6267..a84ad0e46f12 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c@@ -49,7 +49,9 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,return; if (try_page_mte_tagging(page)) { - mte_clear_page_tags(page_address(page)); + unsigned long i, nr_pages = compound_nr(page); + for (i = 0; i < nr_pages; i++) + mte_clear_page_tags(page_address(page + i)); set_page_mte_tagged(page); } }@@ -57,22 +59,23 @@ static void mte_sync_page_tags(struct page *page, pte_t old_pte,void mte_sync_tags(pte_t old_pte, pte_t pte) { struct page *page = pte_page(pte); - long i, nr_pages = compound_nr(page); - bool check_swap = nr_pages == 1; + bool check_swap = !PageCompound(page); bool pte_is_tagged = pte_tagged(pte); /* Early out if there's nothing to do */ if (!check_swap && !pte_is_tagged) return; + /*+ * HugeTLB pages are always fully mapped, so only setting head page's+ * PG_mte_* flags is enough. + */ + if (PageHuge(page)) + page = compound_head(page); + /* if PG_mte_tagged is set, tags have already been initialised */ - for (i = 0; i < nr_pages; i++, page++) { - if (!page_mte_tagged(page)) { - mte_sync_page_tags(page, old_pte, check_swap, - pte_is_tagged); - set_page_mte_tagged(page); - } - } + if (!page_mte_tagged(page)) + mte_sync_page_tags(page, old_pte, check_swap, pte_is_tagged); /* ensure the tags are visible before the PTE is set */ smp_wmb(); diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c index 5626ddb540ce..692198d8c0d2 100644 --- a/arch/arm64/kvm/guest.c +++ b/arch/arm64/kvm/guest.c @@ -1079,7 +1079,7 @@ long kvm_vm_ioctl_mte_copy_tags(struct kvm *kvm, /* uaccess failed, don't leave stale tags */ if (num_tags != MTE_GRANULES_PER_PAGE) - mte_clear_page_tags(page); + mte_clear_page_tags(page_address(page)); set_page_mte_tagged(page); kvm_release_pfn_dirty(pfn); diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 31d7fa4c7c14..b531b1d75cda 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c@@ -1173,11 +1173,10 @@ static void sanitise_mte_tags(struct kvm *kvm, kvm_pfn_t pfn,if (!kvm_has_mte(kvm)) return; - for (i = 0; i < nr_pages; i++, page++) { - if (try_page_mte_tagging(page)) { - mte_clear_page_tags(page_address(page)); - set_page_mte_tagged(page); - } + if (try_page_mte_tagging(page)) { + for (i = 0; i < nr_pages; i++) + mte_clear_page_tags(page_address(page + i)); + set_page_mte_tagged(page); } }