Re: [PATCH v7 29/29] arm64: mte: Add Memory Tagging Extension documentation

Szabolcs Nagy <szabolcs.nagy@xxxxxxx> · Mon, 10 Aug 2020 15:13:09 +0100

The 08/07/2020 16:19, Catalin Marinas wrote:
> On Mon, Aug 03, 2020 at 01:43:10PM +0100, Szabolcs Nagy wrote:
> > if we can always turn sync tag checks on early whenever mte is
> > available then i think there is no issue.
> > 
> > but if we have to make the decision later for compatibility or
> > performance reasons then per thread setting is problematic.
> 
> At least for libc, I'm not sure how you could even turn MTE on at
> run-time. The heap allocations would have to be mapped with PROT_MTE as
> we can't easily change them (well, you could mprotect(), assuming the
> user doesn't use tagged pointers on them).

e.g. dlopen of library with stack tagging.
(libc can mark stacks with PROT_MTE at that time)

or just turn on sync tag checks later when using
heap tagging.

> 
> There is a case to switch tag checking from asynchronous to synchronous
> at run-time based on a signal but that's rather specific to Android
> where zygote controls the signal handler. I don't think you can do this
> with libc. Even on Android, since the async fault signal is delivered
> per thread, it probably does this lazily (alternatively, it could issue
> a SIGUSRx to the other threads for synchronisation).

i think what that zygote is doing is a valid use-case but
in a normal linux setup the application owns the signal
handlers so the tag check switch has to be done by the
application. the libc can expose some api for it, so in
principle it's enough if the libc can do the runtime
switch, but we dont plan to add new libc apis for mte.

> > use of the prctl outside of libc is very limited if it's per thread
> > only:
> 
> In the non-Android context, I think the prctl() for MTE control should
> be restricted to the libc. You can control the mode prior to the process
> being started using environment variables. I really don't see how the
> libc could handle the changing of the MTE behaviour at run-time without
> itself handling signals.
> 
> > - application code may use it in a (elf specific) pre-initialization
> >   function, but that's a bit obscure (not exposed in c) and it is
> >   reasonable for an application to enable mte checks after it
> >   registered a signal handler for mte faults. (and at that point it
> >   may be multi-threaded).
> 
> Since the app can install signal handlers, it can also deal with
> notifying other threads with a SIGUSRx, assuming that it decided this
> after multiple threads were created. If it does this while
> single-threaded, subsequent threads would inherit the first one.

the application does not know what libraries create what
threads in the background, i dont think there is a way to
send signals to each thread (e.g. /proc/self/task cannot
be read atomically with respect to thread creation/exit).

the libc controls thread creation and exit so it can have
a list of threads it can notify, but an application cannot
do this. (libc could provide an api so applications can
do some per thread operation, but a libc would not do this
happily: currently there are locks around thread creation
and exit that are only needed for this "signal all threads"
mechanism which makes it hard to expose to users)

one way applications sometimes work this around is to
self re-exec. but that's a big hammer and not entirely
reliable (e.g. the exe may not be available on the
filesystem any more or the commandline args may need
to be preserved but they are clobbered, or some complex
application state needs to be recreated etc)

> 
> The only use-case I see for doing this in the kernel is if the code
> requiring an MTE behaviour change cannot install signal handlers. More
> on this below.
> 
> > - library code normally initializes per thread state on the first call
> >   into the library from a given thread, but with mte, as soon as
> >   memory / pointers are tagged in one thread, all threads are
> >   affected: not performing checks in other threads is less secure (may
> >   be ok) and it means incompatible syscall abi (not ok). so at least
> >   PR_TAGGED_ADDR_ENABLE should have process wide setting for this
> >   usage.
> 
> My assumption with MTE is that the libc will initialise it when the
> library is loaded (something __attribute__((constructor))) and it's
> still in single-threaded mode. Does it wait until the first malloc()
> call? Also, is there such thing as a per-thread initialiser for a
> dynamic library (not sure it can be implemented in practice though)?

there is no per thread initializer in an elf module.
(tls state is usually initialized lazily in threads
when necessary.)

malloc calls can happen before the ctors of an LD_PRELOAD
library and threads can be created before both.
glibc runs ldpreload ctors after other library ctors.

custom allocator can be of course dlopened.
(i'd expect several language runtimes to have their
own allocator and support dlopening the runtime)

> 
> The PR_TAGGED_ADDR_ENABLE synchronisation at least doesn't require IPIs
> to other CPUs to change the hardware state. However, it can still race
> with thread creation or a prctl() on another thread, not sure what we
> can define here, especially as it depends on the kernel internals: e.g.
> thread creation copies some data structures of the calling thread but at
> the same time another thread wants to change such structures for all
> threads of that process. The ordering of events here looks pretty
> fragile.
> 
> Maybe with another global status (per process) which takes priority over
> the per thread one would be easier. But such priority is not temporal
> (i.e. whoever called prctl() last) but pretty strict: once a global
> control was requested, it will remain global no matter what subsequent
> threads request (or we can do it the other way around).

i see.

> > but i guess it is fine to design the mechanism for these in a later
> > linux version, until then such usage will be unreliable (will depend
> > on how early threads are created).
> 
> Until we have a real use-case, I'd not complicate the matters further.
> For example, I'm still not sure how realistic it is for an application
> to load a new heap allocator after some threads were created. Even the
> glibc support, I don't think it needs this.
> 
> Could an LD_PRELOADED library be initialised after threads were created
> (I guess it could if another preloaded library created threads)? Even if
> it does, do we have an example or it's rather theoretical.

i believe this happens e.g. in applications built
with tsan. (the thread sanitizer creates a
background thread early which i think does not
call malloc itself but may want to access
malloced memory, but i don't have a setup with
tsan support to test this)

> 
> If this becomes an essential use-case, we can look at adding a new flag
> for prctl() which would set the option globally, with the caveats
> mentioned above. It doesn't need to be in the initial ABI (and the
> PR_TAGGED_ADDR_ENABLE is already upstream).
> 
> Thanks.
> 
> -- 
> Catalin