Re: [RFC PATCH v0 0/6] x86/AMD: Userspace address tagging

Catalin Marinas <catalin.marinas@xxxxxxx> · Fri, 8 Apr 2022 18:41:24 +0100

On Mon, Mar 21, 2022 at 03:29:34PM -0700, Andy Lutomirski wrote:
> On Thu, Mar 10, 2022, at 3:15 AM, Bharata B Rao wrote:
> > This patchset makes use of Upper Address Ignore (UAI) feature available
> > on upcoming AMD processors to provide user address tagging support for x86/AMD.
> >
> > UAI allows software to store a tag in the upper 7 bits of a logical
> > address [63:57]. When enabled, the processor will suppress the
> > traditional canonical address checks on the addresses. More information
> > about UAI can be found in section 5.10 of 'AMD64 Architecture
> > Programmer's Manual, Vol 2: System Programming' which is available from
> >
> > https://bugzilla.kernel.org/attachment.cgi?id=300549
> 
> I hate to be a pain, but I'm really not convinced that this feature is
> suitable for Linux.  There are a few reasons:
> 
> Right now, the concept that the high bit of an address determines
> whether it's a user or a kernel address is fairly fundamental to the
> x86_64 (and x86_32!) code.  It may not be strictly necessary to
> preserve this, but violating it would require substantial thought.
> With UAI enabled, kernel and user addresses are, functionally,
> interleaved.  This makes things like access_ok checks, and more
> generally anything that operates on a range of addresses, behave
> potentially quite differently.  A lot of auditing of existing code
> would be needed to make it safe.

Just catching up with this thread. I'm not entirely familiar with the
x86 codebase but some points from the arm64 TBI (top-byte ignore)
feature that may be useful:

In the 52-bit VA configuration (maximum) the kernel addresses on arm64
start at 0xfff00000_00000000 and the user ones go up to
0x000fffff_ffffffff. Anything in between these addresses would trigger a
fault on access. So a non-zero top-byte, even with bit 63 set, would not
access any kernel address unless bits 52 to 63 are all 1 (and this would
fail the access_ok() check, see below).

On arm64 we had TBI from day 0 but the syscall ABI did not allow user
tagged pointers into the kernel. An access_ok() checking addr < TASK_SIZE
was sufficient. With the tagged address ABI, we wanted to allow user
addresses with a non-zero top byte into the kernel. The access_ok() was
changed to sign-extend from bit 55 before comparing with TASK_SIZE. The
hardware also uses bit 55 to select the user or the kernel page tables
(TTBR0/TTBR1_EL1 regs or current->mm->pgd vs swapper_pg_dir in Linux
terms).

I haven't looked at the AMD UAI feature but if it still selects the user
vs kernel page tables based on bit 63, there may be a potential problem.
However, if access_ok() ensures that bit 56 is 0 for valid user
addresses, such access would fault as it's below the kernel's
0xff000000_00000000 limit (if I got it correctly for x86).

Since the UAI goes from bit 57 and up, I have a suspicion that it keeps
bit 56 for user vs kernel address selection. An access_ok()
sign-extending from this bit should be sufficient. As I said above,
there's no risk if such addresses get past access_ok(). With bit 56
cleared they'd not be able to access any kernel data.

(that's unless I missed something in the x86 kernel address layout)

> UAI looks like it wasn't intended to be context switched and, indeed,
> your series doesn't context switch it.  As far as I'm concerned, this
> is an error, and if we support UAI at all, we should context switch
> it.  Yes, this will be slow, perhaps painfully slow.  AMD knows how to
> fix it by, for example, reading the Intel SDM.  By *not* context
> switching UAI, we force it on for all user code, including
> unsuspecting user code, as well as for kernel code.  Do we actually
> want it on for kernel code?  With LAM, in contrast, the semantics for
> kernel pointers vs user pointers actually make sense and can be set
> per mm, which will make things like io_uring (in theory) do the right
> thing.

Arm64 does not context switch the hardware TBI feature either (and it
was always on from the start). A reason is that it requires expensive
TLB maintenance. What we do context switch is the opt-in to the tagged
address ABI which allows tagged pointers into the kernel. That's purely
a software choice (TIF flag) and it only affects the access_ok() check.

With KASAN enabled, we enable the TBI feature for the kernel as well,
it is independently controlled from the user one.

> UAI and LAM are incompatible from a userspace perspective.  Since LAM
> is pretty clearly superior [0], it seems like a better long term
> outcome would be for programs that want tag bits to target LAM and for
> AMD to support LAM if there is demand.  For that matter, do we
> actually expect any userspace to want to support UAI?  (Are there
> existing too-clever sandboxes that would be broken by enabling UAI?)