Re: [PATCH] KVM: arm64: permit MAP_SHARED mappings with MTE enabled

Catalin Marinas <catalin.marinas@xxxxxxx> · Wed, 29 Jun 2022 20:15:07 +0100

On Tue, Jun 28, 2022 at 11:54:51AM -0700, Peter Collingbourne wrote:
> On Tue, Jun 28, 2022 at 10:58 AM Catalin Marinas
> <catalin.marinas@xxxxxxx> wrote:
> > That's why it would be interesting to see
> > the effect of using DC GZVA instead of DC ZVA for page zeroing.
> >
> > I suspect on Android you'd notice the fork() penalty a bit more with all
> > the copy-on-write having to copy tags. But we can't tell until we do
> > some benchmarks. If the penalty is indeed significant, we'll go back to
> > assessing the races here.
> 
> Okay, I can try to measure it. I do feel rather strongly though that
> we should try to avoid tagging pages as much as possible even ignoring
> the potential performance implications.
> 
> Here's one more idea: we can tag pages eagerly as you propose, but
> introduce an opt-out. For example, we introduce a MAP_NOMTE flag,
> which would prevent tag initialization as well as causing any future
> attempt to mprotect(PROT_MTE) to fail. Allocators that know that the
> memory will not be used for MTE in the future can set this flag. For
> example, Scudo can start setting this flag once MTE has been disabled
> as it has no mechanism for turning MTE back on once disabled. And that
> way we will end up with no tags on the heap in the processes with MTE
> disabled. Mappings with MAP_NOMTE would not be permitted in the guest
> memory space of MTE enabled guests. For executables mapped by the
> kernel we may consider adding a bit to the ELF program headers to
> enable MAP_NOMTE.

I don't like such negative flags and we should aim for minimal changes
to code that doesn't care about MTE. If there's a performance penalty
with zeroing the tags, we'll keep looking at the lazy tag
initialisation.

In the meantime, I'll think some more about the lazy stuff. We need at
least mte_sync_tags() fixed to set the PG_mte_tagged after the tags have
been updated (fixes the CoW + mprotect() race but probably breaks
concurrent MAP_SHARED mprotect()). We'd have to add some barriers (maybe
in a new function, set_page_tagged()). Some cases like restoring from
swap (both private and shared) have the page lock held. KVM doesn't seem
to take any page lock, so it can race with the VMM.

Anyway, I doubt we can get away with a single bit. We can't make use of
PG_locked either since set_pte_at() is called with the page either
locked or unlocked.

-- 
Catalin