Re: [PATCH v1 0/8] x86_64 SandBox Mode arch hooks

Petr Tesařík <petr@xxxxxxxxxxx> · Wed, 14 Feb 2024 20:14:15 +0100

On Wed, 14 Feb 2024 09:29:06 -0800
"H. Peter Anvin" <hpa@xxxxxxxxx> wrote:

> On February 14, 2024 8:41:43 AM PST, "Petr Tesařík" <petr@xxxxxxxxxxx> wrote:
> >On Wed, 14 Feb 2024 07:28:35 -0800
> >"H. Peter Anvin" <hpa@xxxxxxxxx> wrote:
> >  
> >> On February 14, 2024 6:52:53 AM PST, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:  
> >> >On 2/14/24 03:35, Petr Tesarik wrote:    
> >> >> This patch series implements x86_64 arch hooks for the generic SandBox
> >> >> Mode infrastructure.    
> >> >
> >> >I think I'm missing a bit of context here.  What does one _do_ with
> >> >SandBox Mode?  Why is it useful?    
> >> 
> >> Seriously. On the surface it looks like a really bad idea – basically an ad hoc, *more* privileged version of user shave.  
> >
> >Hi hpa,
> >
> >I agree that it kind of tries to do "user mode without user mode".
> >There are some differences from actual user mode:
> >
> >First, from a process management POV, sandbox mode appears to be
> >running in kernel mode. So, there is no way to use ptrace(2), send
> >malicious signals or otherwise interact with the sandbox. In fact,
> >the process can have three independent contexts: user mode, kernel mode
> >and sandbox mode.
> >
> >Second, a sandbox can run unmodified kernel code and interact directly
> >with other parts of the kernel. It's not really possible with this
> >initial patch series, but the plan is that sandbox mode can share locks
> >with the kernel.
> >
> >Third, sandbox code can be trusted for operations like parsing keys for
> >the trusted keychain if the kernel is locked down, i.e. when even a
> >process with UID 0 is not on the same trust level as kernel mode.
> >
> >HTH
> >Petr T
> >  
> 
> This, to me, seems like "all the downsides of a microkernel without the upsides." Furthermore, it breaks security-hardening features like LASS and (to a lesser degree) SMAP. Not to mention dropping global pages?

I must be missing something... But I am always open to learn something new.

I don't see how it breaks SMAP. Sandbox mode runs in its own address
space which does not contain any user-mode pages. While running in
sandbox mode, user pages belong to the sandboxed code, kernel pages are
used to enter/exit kernel mode. Bottom half of the PGD is empty, all
user page translations are removed from TLB.

For a similar reason, I don't see right now how it breaks linear
address space separation. Even if it did, I believe I can take care of
it in the entry/exit path. Anyway, which branch contains the LASS
patches now, so I can test?

As for dropping global pages, that's only part of the story. Indeed,
patch 6/8 of the series sets CR4.PGE to zero to have a known-good
working state, but that code is removed again by patch 8/8. I wanted to
implement lazy TLB flushing separately, so it can be easily reverted if
it is suspected to cause an issue.

Plus, each sandbox mode can use PCID to reduce TLB flushing even more.
I haven't done it, because it would be a waste of time if the whole
concept is scratched.

I believe that only those global pages which are actually accessed by
the sandbox need to be flushed. Yes, some parts of the necessary logic
are missing in the current patch series. I can add them in a v2 series
if you wish.

> All in all, I cannot see this as anything other than an enormous step in the wrong direction, and it isn't even in the sense of "it is harmless if noone uses it" – you are introducing architectural changes that are most definitely *very* harmful both to maintainers and users.

I agree that it adds some burden. After all, that's why the ultimate
decision is up to you, the maintainers. To defend my cause, I hope you
have noticed that if CONFIG_SANDBOX_MODE is not set:

1. literally nothing changes in entry_64.
2. sandbox_mode() always evaluates to false, so the added conditionals in fault.c and traps.c are never executed
3. top_of_instr_stack() always returns current_top_of_stack(), which is equivalent to the code it replaces, namely this_cpu_read(pcpu_hot.top_of_stack)

So, all the interesting stuff is under arch/x86/kernel/sbm/. Shall I
add a corresponding entry with my name to MAINTAINERS?

> To me, this feels like paravirtualization all over again. 20 years later we still have not been able to undo all the damage that did.

OK, I can follow you here. Indeed, there is some similarity with Xen PV
(running kernel code with CPL 3), but I don't think there's more than
this.

Petr T