Re: [LSF/MM/BPF TOPIC] Address Space Isolation

Brendan Jackman <jackmanb@xxxxxxxxxx> · Tue, 12 Mar 2024 17:45:11 +0100

On Tue, 12 Mar 2024 at 15:48, Petr Tesařík <petr@xxxxxxxxxxx> wrote:
> > - How we’ve solved the TLB flushing issues in sensitivity tracking, and
> > how it could be done better.
>
> Hello and welcome! I ran into a similar challenge with SandBox Mode. My
> solution was to run sandbox code with CPL=3 (on x86) and control page
> access with the U/S PTE bit rather than the P bit, which allowed me to
> implement lazy TLB invalidation. The x86 folks didn't like idea...

Hmm, a similar idea might be to use protection keys. I'm not sure if
that really works though, we haven't given it any serious thought,
since not all CPUs support it. So that would be something to explore
as a later optimisation rather than a basic principle.

> For the record, SandBox Mode was designed with confidentiality in mind,
> although the initial patch series left out this part for simplicity. I
> wonder if your objective is to protect kernel data from user space, or
> if you have also considered decomposing the kernel into components that
> are isolated from each other (and then it we could potentially find
> some synergies).

Yeah that's something we've pondered. What I've presented here is
definitely about protecting the kernel from userspace/VM guest but
it's a framework where you could conceivably isolate all sorts of
things. Maybe there's a world where ASI makes unprivileged BPF a more
viable notion.

The thing is, what I'm presenting here doesn't protect against
software bugs at all - if you can get the kernel to architecturally
access data and do something bad with it, ASI will happily remap that
data and branch back to the buggy code. That probably simplifies
things quite a lot as compared to SBM.

But yes, the whole "sensitivity tracking" thing does seem to share
requirements with SandBox Mode, I will need to ponder this some more.