On Wed, 2023-08-23 at 14:13 +0100, Alexandru Elisei wrote: > Introduction > ============ > > Arm has implemented memory coloring in hardware, and the feature is > called > Memory Tagging Extensions (MTE). It works by embedding a 4 bit tag in > bits > 59..56 of a pointer, and storing this tag to a reserved memory > location. > When the pointer is dereferenced, the hardware compares the tag > embedded in > the pointer (logical tag) with the tag stored in memory (allocation > tag). > > The relation between memory and where the tag for that memory is > stored is > static. > > The memory where the tags are stored have been so far unaccessible to > Linux. > This series aims to change that, by adding support for using the tag > storage > memory only as data memory; tag storage memory cannot be itself > tagged. > > > Implementation > ============== > > The series is based on v6.5-rc3 with these two patches cherry picked: > > - mm: Call arch_swap_restore() from unuse_pte(): > > > https://lore.kernel.org/all/20230523004312.1807357-3-pcc@xxxxxxxxxx/ > > - arm64: mte: Simplify swap tag restoration logic: > > > https://lore.kernel.org/all/20230523004312.1807357-4-pcc@xxxxxxxxxx/ > > The above two patches are queued for the v6.6 merge window: > > > https://lore.kernel.org/all/20230702123821.04e64ea2c04dd0fdc947bda3@xxxxxxxxxxxxxxxxxxxx/ > > The entire series, including the above patches, can be cloned with: > > $ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \ > -b arm-mte-dynamic-carveout-rfc-v1 > > On the arm64 architecture side, an extension is being worked on that > will > clarify how MTE tag storage reuse should behave. The extension will > be > made public soon. > > On the Linux side, MTE tag storage reuse is accomplished with the > following changes: > > 1. The tag storage memory is exposed to the memory allocator as a new > migratetype, MIGRATE_METADATA. It behaves similarly to MIGRATE_CMA, > with > the restriction that it cannot be used to allocate tagged memory (tag > storage memory cannot be tagged). On tagged page allocation, the > corresponding tag storage is reserved via alloc_contig_range(). > > 2. mprotect(PROT_MTE) is implemented by changing the pte prot to > PAGE_METADATA_NONE. When the page is next accessed, a fault is taken > and > the corresponding tag storage is reserved. > > 3. When the code tries to copy tags to a page which doesn't have the > tag > storage reserved, the tags are copied to an xarray and restored in > set_pte_at(), when the page is eventually mapped with the tag storage > reserved. > > KVM support has not been implemented yet, that because a non-MTE > enabled VMA > can back the memory of an MTE-enabled VM. After there is a consensus > on the > right approach on the memory management support, I will add it. > > Explanations for the last two changes follow. The gist of it is that > they > were added mostly because of races, and it my intention to make the > code > more robust. > > PAGE_METADATA_NONE was introduced to avoid races with > mprotect(PROT_MTE). > For example, migration can race with mprotect(PROT_MTE): > - thread 0 initiates migration for a page in a non-MTE enabled VMA > and a > destination page is allocated without tag storage. > - thread 1 handles an mprotect(PROT_MTE), the VMA becomes tagged, and > an > access turns the source page that is in the process of being > migrated > into a tagged page. > - thread 0 finishes migration and the destination page is mapped as > tagged, > but without tag storage reserved. > More details and examples can be found in the patches. > > This race is also related to how tag restoring is handled when tag > storage > is missing: when a tagged page is swapped out, the tags are saved in > an > xarray indexed by swp_entry.val. When a page is swapped back in, if > there > are tags corresponding to the swp_entry that the page will replace, > the > tags are unconditionally restored, even if the page will be mapped as > untagged. Because the page will be mapped as untagged, tag storage > was > not reserved when the page was allocated to replace the swp_entry > which has > tags associated with it. > > To get around this, save the tags in a new xarray, this time indexed > by > pfn, and restore them when the same page is mapped as tagged. > > This also solves another race, this time with copy_highpage. In the > scenario where migration races with mprotect(PROT_MTE), before the > page is > mapped, the contents of the source page is copied to the destination. > And > this includes tags, which will be copied to a page with missing tag > storage, which can to data corruption if the missing tag storage is > in use > for data. So copy_highpage() has received a similar treatment to the > swap > code, and the source tags are copied in the xarray indexed by the > destination page pfn. > > > Overview of the patches > ======================= > > Patches 1-3 do some preparatory work by renaming a few functions and > a gfp > flag. > > Patches 4-12 are arch independent and introduce MIGRATE_METADATA to > the > page allocator. > > Patches 13-18 are arm64 specific and add support for detecting the > tag > storage region and onlining it with the MIGRATE_METADATA migratetype. > > Patches 19-24 are arch independent and modify the page allocator to > callback into arch dependant functions to reserve metadata storage > for an > allocation which requires metadata. > > Patches 25-28 are mostly arm64 specific and implement the reservation > and > freeing of tag storage on tagged page allocation. Patch #28 ("mm: > sched: > Introduce PF_MEMALLOC_ISOLATE") adds a current flag, > PF_MEMALLOC_ISOLATE, > which ignores page isolation limits; this is used by arm64 when > reserving > tag storage in the same patch. > > Patches 29-30 add arch independent support for doing > mprotect(PROT_MTE) > when metadata storage is enabled. > > Patches 31-37 are mostly arm64 specific and handle the restoring of > tags > when tag storage is missing. The exceptions are patches 32 (adds the > arch_swap_prepare_to_restore() function) and 35 (add > PAGE_METADATA_NONE > support for THPs). > > Testing > ======= > > To enable MTE dynamic tag storage: > > - CONFIG_ARM64_MTE_TAG_STORAGE=y > - system_supports_mte() returns true > - kasan_hw_tags_enabled() returns false > - correct DTB node (for the specification, see commit "arm64: mte: > Reserve tag > storage memory") > > Check dmesg for the message "MTE tag storage enabled" or grep for > metadata > in /proc/vmstat. > > I've tested the series using FVP with MTE enabled, but without > support for > dynamic tag storage reuse. To simulate it, I've added two fake tag > storage > regions in the DTB by splitting a 2GB region roughly into 33 slices > of size > 0x3e0_0000, and using 32 of them for tagged memory and one slice for > tag > storage: > > diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts > b/arch/arm64/boot/dts/arm/fvp-base-revc.dts > index 60472d65a355..bd050373d6cf 100644 > --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts > +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts > @@ -165,10 +165,28 @@ C1_L2: l2-cache1 { > }; > }; > > - memory@80000000 { > + memory0: memory@80000000 { > device_type = "memory"; > - reg = <0x00000000 0x80000000 0 0x80000000>, > - <0x00000008 0x80000000 0 0x80000000>; > + reg = <0x00 0x80000000 0x00 0x7c000000>; > + }; > + > + metadata0: metadata@c0000000 { > + compatible = "arm,mte-tag-storage"; > + reg = <0x00 0xfc000000 0x00 0x3e00000>; > + block-size = <0x1000>; > + memory = <&memory0>; > + }; > + > + memory1: memory@880000000 { > + device_type = "memory"; > + reg = <0x08 0x80000000 0x00 0x7c000000>; > + }; > + > + metadata1: metadata@8c0000000 { > + compatible = "arm,mte-tag-storage"; > + reg = <0x08 0xfc000000 0x00 0x3e00000>; > + block-size = <0x1000>; > + memory = <&memory1>; > }; > Hi Alexandru, AFAIK, the above memory configuration means that there are two region of dram(0x80000000-0xfc000000 and 0x8_80000000-0x8_fc0000000) and this is called PDD memory map. Document[1] said there are some constraints of tag memory as below. | The following constraints apply to the tag regions in DRAM: | 1. The tag region cannot be interleaved with the data region. | The tag region must also be above the data region within DRAM. | | 2.The tag region in the physical address space cannot straddle | multiple regions of a memory map. | | PDD memory map is not allowed to have part of the tag region between | 2GB-4GB and another part between 34GB-64GB. I'm not sure if we can separate tag memory with the above configuration. Or do I miss something? [1] https://developer.arm.com/documentation/101569/0300/?lang=en (Section 5.4.6.1) Thanks, Kuan-Ying Lee > reserved-memory { > > > Alexandru Elisei (37): > mm: page_alloc: Rename gfp_to_alloc_flags_cma -> > gfp_to_alloc_flags_fast > arm64: mte: Rework naming for tag manipulation functions > arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED > mm: Add MIGRATE_METADATA allocation policy > mm: Add memory statistics for the MIGRATE_METADATA allocation > policy > mm: page_alloc: Allocate from movable pcp lists only if > ALLOC_FROM_METADATA > mm: page_alloc: Bypass pcp when freeing MIGRATE_METADATA pages > mm: compaction: Account for free metadata pages in > __compact_finished() > mm: compaction: Handle metadata pages as source for direct > compaction > mm: compaction: Do not use MIGRATE_METADATA to replace pages with > metadata > mm: migrate/mempolicy: Allocate metadata-enabled destination page > mm: gup: Don't allow longterm pinning of MIGRATE_METADATA pages > arm64: mte: Reserve tag storage memory > arm64: mte: Expose tag storage pages to the MIGRATE_METADATA > freelist > arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK > arm64: mte: Move tag storage to MIGRATE_MOVABLE when MTE is > disabled > arm64: mte: Disable dynamic tag storage management if HW KASAN is > enabled > arm64: mte: Check that tag storage blocks are in the same zone > mm: page_alloc: Manage metadata storage on page allocation > mm: compaction: Reserve metadata storage in compaction_alloc() > mm: khugepaged: Handle metadata-enabled VMAs > mm: shmem: Allocate metadata storage for in-memory filesystems > mm: Teach vma_alloc_folio() about metadata-enabled VMAs > mm: page_alloc: Teach alloc_contig_range() about MIGRATE_METADATA > arm64: mte: Manage tag storage on page allocation > arm64: mte: Perform CMOs for tag blocks on tagged page > allocation/free > arm64: mte: Reserve tag block for the zero page > mm: sched: Introduce PF_MEMALLOC_ISOLATE > mm: arm64: Define the PAGE_METADATA_NONE page protection > mm: mprotect: arm64: Set PAGE_METADATA_NONE for mprotect(PROT_MTE) > mm: arm64: Set PAGE_METADATA_NONE in set_pte_at() if missing > metadata > storage > mm: Call arch_swap_prepare_to_restore() before arch_swap_restore() > arm64: mte: swap/copypage: Handle tag restoring when missing tag > storage > arm64: mte: Handle fatal signal in reserve_metadata_storage() > mm: hugepage: Handle PAGE_METADATA_NONE faults for huge pages > KVM: arm64: Disable MTE is tag storage is enabled > arm64: mte: Enable tag storage management > > arch/arm64/Kconfig | 13 + > arch/arm64/include/asm/assembler.h | 10 + > arch/arm64/include/asm/memory_metadata.h | 49 ++ > arch/arm64/include/asm/mte-def.h | 16 +- > arch/arm64/include/asm/mte.h | 40 +- > arch/arm64/include/asm/mte_tag_storage.h | 36 ++ > arch/arm64/include/asm/page.h | 5 +- > arch/arm64/include/asm/pgtable-prot.h | 2 + > arch/arm64/include/asm/pgtable.h | 33 +- > arch/arm64/kernel/Makefile | 1 + > arch/arm64/kernel/elfcore.c | 14 +- > arch/arm64/kernel/hibernate.c | 46 +- > arch/arm64/kernel/mte.c | 31 +- > arch/arm64/kernel/mte_tag_storage.c | 667 > +++++++++++++++++++++++ > arch/arm64/kernel/setup.c | 7 + > arch/arm64/kvm/arm.c | 6 +- > arch/arm64/lib/mte.S | 30 +- > arch/arm64/mm/copypage.c | 26 + > arch/arm64/mm/fault.c | 35 +- > arch/arm64/mm/mteswap.c | 113 +++- > fs/proc/meminfo.c | 8 + > fs/proc/page.c | 1 + > include/asm-generic/Kbuild | 1 + > include/asm-generic/memory_metadata.h | 50 ++ > include/linux/gfp.h | 10 + > include/linux/gfp_types.h | 14 +- > include/linux/huge_mm.h | 6 + > include/linux/kernel-page-flags.h | 1 + > include/linux/migrate_mode.h | 1 + > include/linux/mm.h | 12 +- > include/linux/mmzone.h | 26 +- > include/linux/page-flags.h | 1 + > include/linux/pgtable.h | 19 + > include/linux/sched.h | 2 +- > include/linux/sched/mm.h | 13 + > include/linux/vm_event_item.h | 5 + > include/linux/vmstat.h | 2 + > include/trace/events/mmflags.h | 5 +- > mm/Kconfig | 5 + > mm/compaction.c | 52 +- > mm/huge_memory.c | 109 ++++ > mm/internal.h | 7 + > mm/khugepaged.c | 7 + > mm/memory.c | 180 +++++- > mm/mempolicy.c | 7 + > mm/migrate.c | 6 + > mm/mm_init.c | 23 +- > mm/mprotect.c | 46 ++ > mm/page_alloc.c | 136 ++++- > mm/page_isolation.c | 19 +- > mm/page_owner.c | 3 +- > mm/shmem.c | 14 +- > mm/show_mem.c | 4 + > mm/swapfile.c | 4 + > mm/vmscan.c | 3 + > mm/vmstat.c | 13 +- > 56 files changed, 1834 insertions(+), 161 deletions(-) > create mode 100644 arch/arm64/include/asm/memory_metadata.h > create mode 100644 arch/arm64/include/asm/mte_tag_storage.h > create mode 100644 arch/arm64/kernel/mte_tag_storage.c > create mode 100644 include/asm-generic/memory_metadata.h >