Re: [PATCH] hugetlbfs: add MTE support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 7/2/24 5:04 PM, Yang Shi wrote:


On 7/2/24 5:34 AM, Catalin Marinas wrote:
On Tue, Jun 25, 2024 at 04:37:17PM -0700, Yang Shi wrote:
MTE can be supported on ram based filesystem. It is supported on tmpfs.
There is use case to use MTE on hugetlbfs as well, adding MTE support.

Signed-off-by: Yang Shi <yang@xxxxxxxxxxxxxxxxxxxxxx>
---
  fs/hugetlbfs/inode.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index ecad73a4f713..c34faef62daf 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -110,7 +110,7 @@ static int hugetlbfs_file_mmap(struct file *file, struct vm_area_struct *vma)
       * way when do_mmap unwinds (may be important on powerpc
       * and ia64).
       */
-    vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND);
+    vm_flags_set(vma, VM_HUGETLB | VM_DONTEXPAND | VM_MTE_ALLOWED);
      vma->vm_ops = &hugetlb_vm_ops;
Last time I checked, about a year ago, this was not sufficient. One
issue is that there's no arch_clear_hugetlb_flags() implemented by your
patch, leaving PG_arch_{2,3} set on a page. The other issue was that I
initially tried to do this only on the head page but this did not go
well with the folio_copy() -> copy_highpage() which expects the
PG_arch_* flags on each individual page. The alternative was for
arch_clear_hugetlb_flags() to iterate over all the pages in a folio.

Thanks for pointing this out. I did miss this point. I took a quick look at when the PG_ flags are set. IIUC, it is set by post_alloc_hook() for order-0 anonymous folio (clearing page and tags) and set_ptes() for others (just clear tags), for example, THP and hugetlb.

I can see THP does set the PG_mte_tagged flag for each sub pages. But it seems it does not do it for hugetlb if I read the code correctly. The call path is:

hugetlb_fault() ->
  hugetlb_no_page->
    set_huge_pte_at ->
      __set_ptes() ->
        __sync_cache_and_tags() ->


The __set_ptes() is called in a loop:

if (!pte_present(pte)) {
        for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
            __set_ptes(mm, addr, ptep, pte, 1);
        return;
    }

The ncontig and pgsize are returned by num_contig_ptes(). For example, 2M hugetlb, ncontig is 1 and pgsize is 2M IIUC. So it means actually just the head page has PG_mte_tagged set. If so the copy_highpage() will just copy the old head page's flag to the new head page, and the tag. All the sub pages don't have PG_mte_tagged set.


Is it expected behavior? I'm supposed we need tags for every sub pages too, right?

We should need something like the below to have tags and page flag set up for each sub page:

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 3f09ac73cce3..528164deef27 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -228,9 +228,12 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
        int ncontig;
        unsigned long pfn, dpfn;
        pgprot_t hugeprot;
+       unsigned long nr = sz >> PAGE_SHIFT;

        ncontig = num_contig_ptes(sz, &pgsize);

+       __sync_cache_and_tags(pte, nr);
+
        if (!pte_present(pte)) {
                for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
                        __set_ptes(mm, addr, ptep, pte, 1);



I'd also like to see some tests added to
tools/testing/selftest/arm64/mte to exercise MAP_HUGETLB with PROT_MTE:
write/read tags, a series of mman+munmap (mostly to check if old page
flags are still around), force some copy on write. I don't think we
should merge the patch without proper tests.

An untested hunk on top of your changes:

diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 3954cbd2ff56..5357b00b9087 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -20,7 +20,19 @@ extern bool arch_hugetlb_migration_supported(struct hstate *h);
    static inline void arch_clear_hugetlb_flags(struct folio *folio)
  {
-    clear_bit(PG_dcache_clean, &folio->flags);
+    unsigned long i, nr_pages = folio_nr_pages(folio);
+    const unsigned long clear_flags = BIT(PG_dcache_clean) |
+        BIT(PG_arch_2) | BIT(PG_arch_3);
+
+    if (!system_supports_mte()) {
+        clear_bit(PG_dcache_clean, &folio->flags);
+        return;
+    }
+
+    for (i = 0; i < nr_pages; i++) {
+        struct page *page = folio_page(folio, i);
+        page->flags &= ~clear_flags;
+    }
  }
  #define arch_clear_hugetlb_flags arch_clear_hugetlb_flags
  diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 5966ee4a6154..304dfc499e68 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -28,7 +28,8 @@ static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags)        * backed by tags-capable memory. The vm_flags may be overridden by a
       * filesystem supporting MTE (RAM-based).
       */
-    if (system_supports_mte() && (flags & MAP_ANONYMOUS))
+    if (system_supports_mte() &&
+        (flags & (MAP_ANONYMOUS | MAP_HUGETLB)))
          return VM_MTE_ALLOWED;

Do we really need this change? IIRC, the mmap_region() will call hugetlbfs's mmap and set VM_MTE_ALLOWED in vma->vm_flags, then update vma->vm_page_prot with the new vma->vm_flags.

If this is needed, MTE for tmpfs won't work, right?

        return 0;







[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux