The series is based on v6.7-rc1 and can be cloned with: $ git clone https://gitlab.arm.com/linux-arm/linux-ae.git \ -b arm-mte-dynamic-carveout-rfc-v2 Introduction ============ Memory Tagging Extension (MTE) is implemented currently to have a static carve-out of the DRAM to store the allocation tags (a.k.a. memory colour). This is what we call the tag storage. Each 16 bytes have 4 bits of tags, so this means 1/32 of the DRAM, roughly 3% used for the tag storage. This is done transparently by the hardware/interconnect (with firmware setup) and normally hidden from the OS. So a checked memory access to location X generates a tag fetch from location Y in the carve-out and this tag is compared with the bits 59:56 in the pointer. The correspondence from X to Y is linear (subject to a minimum block size to deal with some address interleaving). The software doesn't need to know about this correspondence as we have specific instructions like STG/LDG to location X that lead to a tag store/load to Y. Now, not all memory used by applications is tagged (mmap(PROT_MTE)). For example, some large allocations may not use PROT_MTE at all or only for the first and last page since initialising the tags takes time. The side-effect is that of that 3% of DRAM, only part of it, say 1%, is effectively used. The series aims to take that unused tag storage and release it to the page allocator for normal data usage. The first complication is that a PROT_MTE page allocation at address X will need to reserve the tag storage page at location Y (and migrate any data in that page if it is in use). To make things worse, pages in the tag storage/carve-out range cannot use PROT_MTE themselves on current hardware, so this adds the second complication - a heterogeneous memory layout. The kernel needs to know where to allocate a PROT_MTE page from or migrate a current page if it becomes PROT_MTE (mprotect()) and the range it is in does not support tagging. Some other complications are arm64-specific like cache coherency between tags and data accesses. There is a draft architecture spec which will be released soon, detailing how the hardware behaves. All of this will be entirely transparent to userspace. As with the current kernel (without this dynamic tag storage), a user only needs to ask for PROT_MTE mappings to get tagged pages. Implementation ============== MTE tag storage reuse is accomplished with the following changes to the Linux kernel: 1. The tag storage memory is exposed to the memory allocator as MIGRATE_CMA. The arm64 code manages this memory directly instead of using cma_declare_contiguous/cma_alloc for performance reasons. There is a limitation to this approach: MIGRATE_CMA cannot be used for tagged allocations, even if not all MIGRATE_CMA memory is tag storage. 2. mprotect(PROT_MTE) is implemented by adding a fault-on-access mechanism for existing pages. When a page is next accessed, a fault is taken and the corresponding tag storage is reserved. 3. When the code tries to copy tags to a page (when swapping in a newly allocated page, or during migration/THP collapse) which doesn't have the tag storage reserved, the tags are copied to an xarray and restored when tag storage is reserved for the destination page. KVM support has not been implemented yet, that because a non-MTE enabled VMA can back the memory of an MTE-enabled VM. It will be added after there is a consensus on the right approach on the memory management support. Overview of the patches ======================= For people not interested in the arm64 details, you probably want to start with patches 1-10, which mostly deal with adding the necessary hooks to the memory management code, and patches 19 and 20 which add the page fault-on-access mechanism for regular pages, respectively huge pages. Patch 21 is rather invasive, it moves the definition of struct migration_target_control out of mm/internal.h to migrate.h, and the arm64 code also uses isolate_lru_page() and putback_movable_pages() when migrating a tag storage page out of a PROT_MTE VMA. And finally patch 26 is an optimization for a performance regression that has been reported with Chrome and it introduces CONFIG_WANTS_TAKE_PAGE_OFF_BUDDY to allow arm64 to use take_page_off_buddy() to fast track reserving tag storage when the page is free. The rest of the patches are mostly arm64 specific. Patches 11-18 support for detecting the tag storage region and reserving tag storage when a tagged page is allocated. Patches 19-21 add the page fault-on-access on mechanism and use it to reserve tag storage when needed. Patches 22 and 23 handle saving tags temporarily to an xarray if the page doesn't have tag storage, and copying the tags over to the tagged page when tag storage is reserved. Changelog ========= Changes since RFC v1 [1]: * The entire series has been reworked to remove MIGRATE_METADATA and put tag storage pages on the MIGRATE_CMA freelists. * Changed how tags are saved and restored when copying them from one page to another if the destination page doesn't have tag storage - now the tags are restored when tag storage is reserved for the destination page instead of restoring them in set_pte_at() -> mte_sync_tags(). [1] https://lore.kernel.org/lkml/20230823131350.114942-1-alexandru.elisei@xxxxxxx/ Testing ======= To enable MTE dynamic tag storage: - CONFIG_ARM64_MTE_TAG_STORAGE=y - system_supports_mte() returns true - kasan_hw_tags_enabled() returns false - correct DTB node (for the specification, see commit "arm64: mte: Reserve tag storage memory") Check dmesg for the message "MTE tag storage region management enabled". I've tested the series using FVP with MTE enabled, but without support for dynamic tag storage reuse. To simulate it, I've added two fake tag storage regions in the DTB by splitting the upper 2GB memory region into 3: one region for normal RAM, followed by the tag storage for the lower 2GB of memory, then the tag storage for the normal RAM region. Like below: diff --git a/arch/arm64/boot/dts/arm/fvp-base-revc.dts b/arch/arm64/boot/dts/arm/fvp-base-revc.dts index 60472d65a355..8c719825a9b3 100644 --- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts @@ -165,12 +165,30 @@ C1_L2: l2-cache1 { }; }; - memory@80000000 { + memory0: memory@80000000 { device_type = "memory"; - reg = <0x00000000 0x80000000 0 0x80000000>, - <0x00000008 0x80000000 0 0x80000000>; + reg = <0x00000000 0x80000000 0 0x80000000>; }; + memory1: memory@880000000 { + device_type = "memory"; + reg = <0x00000008 0x80000000 0 0x78000000>; + }; + + tags0: tag-storage@8f8000000 { + compatible = "arm,mte-tag-storage"; + reg = <0x00000008 0xf8000000 0 0x4000000>; + block-size = <0x1000>; + memory = <&memory0>; + }; + + tags1: tag-storage@8fc000000 { + compatible = "arm,mte-tag-storage"; + reg = <0x00000008 0xfc000000 0 0x3c00000>; + block-size = <0x1000>; + memory = <&memory1>; + }; + reserved-memory { #address-cells = <2>; #size-cells = <2>; Alexandru Elisei (27): arm64: mte: Rework naming for tag manipulation functions arm64: mte: Rename __GFP_ZEROTAGS to __GFP_TAGGED mm: cma: Make CMA_ALLOC_SUCCESS/FAIL count the number of pages mm: migrate/mempolicy: Add hook to modify migration target gfp mm: page_alloc: Add an arch hook to allow prep_new_page() to fail mm: page_alloc: Allow an arch to hook early into free_pages_prepare() mm: page_alloc: Add an arch hook to filter MIGRATE_CMA allocations mm: page_alloc: Partially revert "mm: page_alloc: remove stale CMA guard code" mm: Allow an arch to hook into folio allocation when VMA is known mm: Call arch_swap_prepare_to_restore() before arch_swap_restore() arm64: mte: Reserve tag storage memory arm64: mte: Add tag storage pages to the MIGRATE_CMA migratetype arm64: mte: Make tag storage depend on ARCH_KEEP_MEMBLOCK arm64: mte: Disable dynamic tag storage management if HW KASAN is enabled arm64: mte: Check that tag storage blocks are in the same zone arm64: mte: Manage tag storage on page allocation arm64: mte: Perform CMOs for tag blocks on tagged page allocation/free arm64: mte: Reserve tag block for the zero page mm: mprotect: Introduce PAGE_FAULT_ON_ACCESS for mprotect(PROT_MTE) mm: hugepage: Handle huge page fault on access mm: arm64: Handle tag storage pages mapped before mprotect(PROT_MTE) arm64: mte: swap: Handle tag restoring when missing tag storage arm64: mte: copypage: Handle tag restoring when missing tag storage arm64: mte: Handle fatal signal in reserve_tag_storage() KVM: arm64: Disable MTE if tag storage is enabled arm64: mte: Fast track reserving tag storage when the block is free arm64: mte: Enable dynamic tag storage reuse arch/arm64/Kconfig | 16 + arch/arm64/include/asm/assembler.h | 10 + arch/arm64/include/asm/mte-def.h | 16 +- arch/arm64/include/asm/mte.h | 43 +- arch/arm64/include/asm/mte_tag_storage.h | 75 +++ arch/arm64/include/asm/page.h | 5 +- arch/arm64/include/asm/pgtable-prot.h | 2 + arch/arm64/include/asm/pgtable.h | 96 +++- arch/arm64/kernel/Makefile | 1 + arch/arm64/kernel/elfcore.c | 14 +- arch/arm64/kernel/hibernate.c | 46 +- arch/arm64/kernel/mte.c | 12 +- arch/arm64/kernel/mte_tag_storage.c | 686 +++++++++++++++++++++++ arch/arm64/kernel/setup.c | 7 + arch/arm64/kvm/arm.c | 6 +- arch/arm64/lib/mte.S | 34 +- arch/arm64/mm/copypage.c | 59 ++ arch/arm64/mm/fault.c | 261 ++++++++- arch/arm64/mm/mteswap.c | 162 +++++- fs/proc/page.c | 1 + include/linux/gfp_types.h | 14 +- include/linux/huge_mm.h | 2 + include/linux/kernel-page-flags.h | 1 + include/linux/migrate.h | 12 +- include/linux/migrate_mode.h | 1 + include/linux/mmzone.h | 5 + include/linux/page-flags.h | 16 +- include/linux/pgtable.h | 54 ++ include/trace/events/mmflags.h | 5 +- mm/Kconfig | 7 + mm/cma.c | 4 +- mm/huge_memory.c | 5 +- mm/internal.h | 9 - mm/memory-failure.c | 8 +- mm/memory.c | 10 + mm/mempolicy.c | 3 + mm/migrate.c | 3 + mm/page_alloc.c | 118 +++- mm/shmem.c | 14 +- mm/swapfile.c | 7 + 40 files changed, 1668 insertions(+), 182 deletions(-) create mode 100644 arch/arm64/include/asm/mte_tag_storage.h create mode 100644 arch/arm64/kernel/mte_tag_storage.c base-commit: b85ea95d086471afb4ad062012a4d73cd328fa86 -- 2.42.1