The 04/21/2020 15:26, Catalin Marinas wrote: > diff --git a/Documentation/arm64/memory-tagging-extension.rst b/Documentation/arm64/memory-tagging-extension.rst > new file mode 100644 > index 000000000000..f82dfbd70061 > --- /dev/null > +++ b/Documentation/arm64/memory-tagging-extension.rst > @@ -0,0 +1,260 @@ > +=============================================== > +Memory Tagging Extension (MTE) in AArch64 Linux > +=============================================== > + > +Authors: Vincenzo Frascino <vincenzo.frascino@xxxxxxx> > + Catalin Marinas <catalin.marinas@xxxxxxx> > + > +Date: 2020-02-25 > + > +This document describes the provision of the Memory Tagging Extension > +functionality in AArch64 Linux. > + > +Introduction > +============ > + > +ARMv8.5 based processors introduce the Memory Tagging Extension (MTE) > +feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI > +(Top Byte Ignore) feature and allows software to access a 4-bit > +allocation tag for each 16-byte granule in the physical address space. > +Such memory range must be mapped with the Normal-Tagged memory > +attribute. A logical tag is derived from bits 59-56 of the virtual > +address used for the memory access. A CPU with MTE enabled will compare > +the logical tag against the allocation tag and potentially raise an > +exception on mismatch, subject to system registers configuration. > + > +Userspace Support > +================= > + > +When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is > +supported by the hardware, the kernel advertises the feature to > +userspace via ``HWCAP2_MTE``. > + > +PROT_MTE > +-------- > + > +To access the allocation tags, a user process must enable the Tagged > +memory attribute on an address range using a new ``prot`` flag for > +``mmap()`` and ``mprotect()``: > + > +``PROT_MTE`` - Pages allow access to the MTE allocation tags. > + > +The allocation tag is set to 0 when such pages are first mapped in the > +user address space and preserved on copy-on-write. ``MAP_SHARED`` is > +supported and the allocation tags can be shared between processes. > + > +**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and > +RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other > +types of mapping will result in ``-EINVAL`` returned by these system > +calls. > + > +**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot > +be cleared by ``mprotect()``. i think there are some non-obvious madvise operations that may be worth documenting too for mte specific semantics. e.g. MADV_DONTNEED or MADV_FREE can presumably drop tags which means that existing pointers can no longer write to the memory which is a change of behaviour compared to the non-mte case. (affects most malloc implementations that will have to deal with this when implementing heap coloring) there might be other similar problems like MADV_WIPEONFORK that wont work as currently expected when mte is enabled. if such behaviour changes cause serious problems to existing software there may be a need to have a way to opt out from these changes (e.g. MADV_ flag variant that only affects the memory content but not the tags) or to make that the default behaviour. (but i can't tell how widely these are used in ways that can be expected to work with PROT_MTE) > +Tag Check Faults > +---------------- > + > +When ``PROT_MTE`` is enabled on an address range and a mismatch between > +the logical and allocation tags occurs on access, there are three > +configurable behaviours: > + > +- *Ignore* - This is the default mode. The CPU (and kernel) ignores the > + tag check fault. > + > +- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with > + ``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The > + memory access is not performed. > + > +- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the current > + thread, asynchronously following one or multiple tag check faults, > + with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0``. > + > +**Note**: There are no *match-all* logical tags available for user > +applications. > + > +The user can select the above modes, per thread, using the > +``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where > +``flags`` contain one of the following values in the ``PR_MTE_TCF_MASK`` > +bit-field: > + > +- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults > +- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode > +- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode > + > +Tag checking can also be disabled for a user thread by setting the > +``PSTATE.TCO`` bit with ``MSR TCO, #1``. > + > +**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``, > +irrespective of the interrupted context. > + > +**Note**: Kernel accesses to user memory (e.g. ``read()`` system call) > +are only checked if the current thread tag checking mode is > +PR_MTE_TCF_SYNC.