On Wed, Mar 11, 2020 at 03:17:54PM -0700, Richard Henderson wrote: > On 2/26/20 10:05 AM, Catalin Marinas wrote: > > + /* > > + * From include/uapi/linux/prctl.h > > + */ > > + #define PR_SET_TAGGED_ADDR_CTRL 55 > > + #define PR_GET_TAGGED_ADDR_CTRL 56 > > + # define PR_TAGGED_ADDR_ENABLE (1UL << 0) > > + # define PR_MTE_TCF_SHIFT 1 > > + # define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT) > > + # define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT) > > + # define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT) > > + # define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT) > > + # define PR_MTE_TAG_SHIFT 3 > > + # define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT) > > Is there a reason not to include TCMA into the set of bits that userland can > control with this prcrl? > > I know that ordinarily TCR_ELx requires expensive syncing, but for this > particular field there is a note about "software may change this control bit on > a context switch". Which I take to mean that the usual TLB-related syncing may > be omitted. TCMA (unlike TCF) is allowed to be cached in the TLB. If we are to allow the user to configure this field, there are two approaches, each with its own problems: 1. per-thread TCMA (as we do with TCF). Since the field is cached in the TLB (ASID-tagged), we'd have to invalidate the TLB for that ASID every time we switch between threads of the same process on a CPU. 2. per-process TCMA. This solves the problem of TLB invalidation, however you'd have to synchronise all the threads that may run on other CPUs. A simple IPI (as in sys_membarrier() for example) is not sufficient since with CnP (CPU threads sharing the TLB) we'd need a synchronous update. This leaves us with a stop_machine() call and I'm not keen on exposing this to user via a syscall. If you have a strong need for TCMA in user space, please raise it and we can discuss about always allowing match-all tags for user tasks. Note that the kernel will have match-all enabled for kernel addresses. -- Catalin