Re: [PATCH v2] arm64: allow TCR_EL1.TBID0 to be configured

Catalin Marinas <catalin.marinas@xxxxxxx> · Thu, 29 Jul 2021 18:44:09 +0100

On Wed, Jul 28, 2021 at 04:50:07PM -0700, Peter Collingbourne wrote:
> On Wed, Jul 28, 2021 at 9:42 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > On Tue, Jul 27, 2021 at 03:00:10PM -0700, Peter Collingbourne wrote:
> > > On Tue, Jul 27, 2021 at 9:51 AM Catalin Marinas <catalin.marinas@xxxxxxx> wrote:
> > > > On Mon, Jun 21, 2021 at 10:12:04PM -0700, Peter Collingbourne wrote:
> > > > > Introduce a command line flag that controls whether TCR_EL1.TBID0
> > > > > is set at boot time. Since this is a change to the userspace ABI the
> > > > > option defaults to off for now, although it seems likely that we'll
> > > > > be able to change the default at some future point.
> > > > >
> > > > > Setting TCR_EL1.TBID0 increases the number of signature bits used by
> > > > > the pointer authentication instructions for instruction addresses by 8,
> > > > > which improves the security of pointer authentication, but it also has
> > > > > the consequence of changing the operation of the branch instructions
> > > > > so that they no longer ignore the top byte of the target address but
> > > > > instead fault if they are non-zero.
> > > >
> > > > I'm a bit uneasy about the ABI change and not so keen on constraining
> > > > the ABI through the kernel command line. Ideally we should make this an
> > > > opt-in per application (prctl()) but that has some aspects to address
> > >
> > > This doesn't necessarily need to be the end state, we can enhance this
> > > based on need. For example, we could choose to take this patch now and
> > > later implement the per-process opt-in where the default is controlled
> > > by the command line.
> >
> > What's the risk of an application becoming reliant on the new mode
> > (TBID0) and breaking with the old one? Probably very small but I haven't
> > figured out if it's zero. Depending on whether we have PAC or PAC2 (the
> > latter came with 8.6 but optional in 8.3) with TBID0, there are some
> > differences on how the PAC/AUT instructions work and the code generated
> > (XOR with the top bits).
> 
> I think it would be quite small. On Android, at least to begin with
> there would be a mixture of devices with different TBID0 settings
> (devices without PAC support and devices with older kernels would all
> have this disabled), so I think it would be difficult for an
> application to depend on it being enabled.
> 
> > > Or just implement the software-only per-process
> > > TBID0 almost-disablement which would be much simpler than doing it in
> > > hardware, if that is enough to satisfy future needs.
> >
> > I don't entirely follow this.
> 
> Sorry, I was referring to my earlier proposal for recovering from
> tagged PC in the kernel by clearing the tag bits:
> https://lore.kernel.org/linux-arm-kernel/CAMn1gO7+_JhTUw2gS6jnyRV+TCqprmpuCAfee3RCAhNjoVyy9w@xxxxxxxxxxxxxx/

Ah, sorry, I missed this one (or just paged it out entirely over the
holiday).

> > > Otherwise we risk adding "unused" complexity to the kernel that we can
> > > never remove due to API stability guarantees.
> >
> > We've had other debates over the years and, in general, if a kernel
> > change causes apps to break, we'd have to keep the original behaviour.
> > Are there any plans to fix the JITs tools you discovered?
> 
> Yes, we would definitely want to fix the JIT issue in the Android
> platform before rolling out a forward PAC ABI. This would be separate
> from fixing apps, which would need to opt into MTE (or address tagging
> via the target API level) anyway. But if it turns out that there are
> too many apps with these JITs that use MTE or address tagging, I think
> we would need to come back to the kernel to figure out some way to let
> these programs run.

OK, I guess we don't yet have a clear view on how many such apps are
affected.

> > Talking to Will about this he was wondering whether we could make TBID0
> > on by default and clear the tag in PC if we take a fault (on tagged PC),
> > restarting the context. PAC shouldn't be affected since we would only
> > branch to an authenticated (PAC code removed) pointer. If this works,
> > we'd only affect performance slightly on such apps but don't completely
> > break them.
> 
> Right, this sounds exactly like my earlier proposal.

Indeed. Something I haven't figured out yet is whether such handling in
the kernel would weaken PAC. For example, following a failed AUT (the
old style which does not trap), the resulting address would cause a
translation fault. Would the kernel clearing the top byte result in a
potentially valid address?

> > > > first: (a) this bit is permitted to be cached in the TLB so we'd need
> > > > some TLBI when setting it (and a clarification in the specs that it is
> > > > tagged by ASID/VMID, probably fine) and (b) we'd need to context-switch
> > > > TCR_EL1, with a small performance penalty (I don't think it's
> > > > significant but worth testing).
> > >
> > > So TLBI all of the CPUs on prctl() and context-switch TCR_EL1? I
> > > thought there would be DOS concerns with the first part of that?
> >
> > The DoS problem appears if we need to issue an IPI to all CPUs (like
> > stop_machine). The TLBI with broadcast handled in hardware should be OK
> > as it's targeted to a specific ASID. But this would have to be issued
> 
> I see -- I hadn't realised that this instruction is implemented as a
> broadcast. So we would just need to issue the instruction from any CPU
> and we should be good.

Well, as long as it's single-threaded when the prctl() is issued. If
there are multiple threads, we'd have to synchronise all the CPUs. Such
requirement is probably not a big deal anyway as it affects the return
addresses, so it would have to be done early.

> > before any app threads are started, otherwise we'd need to synchronise
> > TCR_EL1. Given that TBID0 toggling affects PAC, this can only be done
> 
> Right, so this would be different from everything currently in
> tagged_addr_ctrl because it would be per-process rather than
> per-thread. So if this were a true TBID0 control we may even want it
> as a separate prctl() since there are certainly use cases for changing
> the other bits of tagged_addr_ctrl while the other threads are
> running.

Since it's permitted to be cached in the TLB, switching it between
threads would require TLBI on each switch, so not feasible.

How early is the prctl(PR_TAGGED_ADDR_ENABLE) called? Anyway, it
probably makes more sense to have a separate control since TBID0 is not
necessarily linked to hwasan or MTE. We'd want more PAC bits even if
hwasan or MTE are not deployed.

> > safely very early in the application before return addresses get a PAC
> > code.
> 
> Right, so maybe the "almost-disablement" would work better since all
> of the signatures would then still be valid, and you could even have
> different settings for different threads if you wanted to (e.g. if you
> arranged to run legacy code only on a specific thread).

Yes, if (a) it doesn't weaken PAC and (b) there are no future plans to
use code TBI as trapping would make it slower.

I think we need to decide whether full TBI would be any useful going
forward. One comment from Szabolcs on the previous version was that
"data only tbi is a more complicated concept than plain tbi". Are there
any tooling plans to benefit from code TBI? Or any use-cases that would
be affected? For example, if we decide mmap() to return tagged pointers
(with a new flag), would JITs want to benefit?

If we need both data and code TBI to coexist, we should look into a
per-process control.

-- 
Catalin