Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 26 Apr 2019 11:49:27 -0700

On Fri, 2019-04-26 at 10:40 -0700, Andy Lutomirski wrote:
> > On Apr 26, 2019, at 8:19 AM, James Bottomley <James.Bottomley@hanse
> > npartnership.com> wrote:
> > 
> > On Fri, 2019-04-26 at 08:07 -0700, Andy Lutomirski wrote:
> > > > On Apr 26, 2019, at 7:57 AM, James Bottomley
> > > > <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > > > 
> > > > > > On Fri, 2019-04-26 at 07:46 -0700, Dave Hansen wrote:
> > > > > > On 4/25/19 2:45 PM, Mike Rapoport wrote:
> > > > > > After the isolated system call finishes, the mappings
> > > > > > created during its execution are cleared.
> > > > > 
> > > > > Yikes.  I guess that stops someone from calling write() a
> > > > > bunch of times on every filesystem using every block device
> > > > > driver and all the DM code to get a lot of code/data faulted
> > > > > in.  But, it also means not even long-running processes will
> > > > > ever have a chance of behaving anything close to normally.
> > > > > 
> > > > > Is this something you think can be rectified or is there
> > > > > something fundamental that would keep SCI page tables from
> > > > > being cached across different invocations of the same
> > > > > syscall?
> > > > 
> > > > There is some work being done to look at pre-populating the
> > > > isolated address space with the expected execution footprint of
> > > > the system call, yes.  It lessens the ROP gadget protection
> > > > slightly because you might find a gadget in the pre-populated
> > > > code, but it solves a lot of the overhead problem.
> > > 
> > > I’m not even remotely a ROP expert, but: what stops a ROP payload
> > > from using all the “fault-in” gadgets that exist — any function
> > > that can return on an error without doing to much will fault in
> > > the whole page containing the function.
> > 
> > The address space pre-population is still per syscall, so you don't
> > get access to the code footprint of a different syscall.  So the
> > isolated address space is created anew for every system call, it's
> > just pre-populated with that system call's expected footprint.
> 
> That’s not what I mean. Suppose I want to use a ROP gadget in
> vmalloc(), but vmalloc isn’t in the page tables. Then first push
> vmalloc itself into the stack. As long as RDI contains a sufficiently
> ridiculous value, it should just return without doing anything. And
> it can return right back into the ROP gadget, which is now available.

Yes, it's not perfect, but stack space for a smashing attack is at a
premium and now you need two stack frames for every gadget you chain
instead of one so we've halved your ability to chain gadgets.

> > > To improve this, we would want some thing that would try to check
> > > whether the caller is actually supposed to call the callee, which
> > > is more or less the hard part of CFI.  So can’t we just do CFI
> > > and call it a day?
> > 
> > By CFI you mean control flow integrity?  In theory I believe so,
> > yes, but in practice doesn't it require a lot of semantic object
> > information which is easy to get from higher level languages like
> > java but a bit more difficult for plain C.
> 
> Yes. As I understand it, grsecurity instruments gcc to create some
> kind of hash of all function signatures. Then any indirect call can
> effectively verify that it’s calling a function of the right type.
> And every return verified a cookie.
> 
> On CET CPUs, RET gets checked directly, and I don’t see the benefit
> of SCI.

Presumably you know something I don't but I thought CET CPUs had been
planned for release for ages, but not actually released yet?

> > > On top of that, a robust, maintainable implementation of this
> > > thing seems very complicated — for example, what happens if
> > > vfree() gets called?
> > 
> > Address space Local vs global object tracking is another thing on
> > our list.  What we'd probably do is verify the global object was
> > allowed to be freed and then hand it off safely to the main kernel
> > address space.
> 
> This seems exceedingly complicated.

It's a research project: we're exploring what's possible so we can
choose the techniques that give the best security improvement for the
additional overhead.

James