Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Fri, 26 Apr 2019 08:19:21 -0700

On Fri, 2019-04-26 at 08:07 -0700, Andy Lutomirski wrote:
> > On Apr 26, 2019, at 7:57 AM, James Bottomley <James.Bottomley@hanse
> > npartnership.com> wrote:
> > 
> > > On Fri, 2019-04-26 at 07:46 -0700, Dave Hansen wrote:
> > > > On 4/25/19 2:45 PM, Mike Rapoport wrote:
> > > > After the isolated system call finishes, the mappings created
> > > > during its execution are cleared.
> > > 
> > > Yikes.  I guess that stops someone from calling write() a bunch
> > > of times on every filesystem using every block device driver and
> > > all the DM code to get a lot of code/data faulted in.  But, it
> > > also means not even long-running processes will ever have a
> > > chance of behaving anything close to normally.
> > > 
> > > Is this something you think can be rectified or is there
> > > something fundamental that would keep SCI page tables from being
> > > cached across different invocations of the same syscall?
> > 
> > There is some work being done to look at pre-populating the
> > isolated address space with the expected execution footprint of the
> > system call, yes.  It lessens the ROP gadget protection slightly
> > because you might find a gadget in the pre-populated code, but it
> > solves a lot of the overhead problem.
> > 
> 
> I’m not even remotely a ROP expert, but: what stops a ROP payload
> from using all the “fault-in” gadgets that exist — any function that
> can return on an error without doing to much will fault in the whole
> page containing the function.

The address space pre-population is still per syscall, so you don't get
access to the code footprint of a different syscall.  So the isolated
address space is created anew for every system call, it's just pre-
populated with that system call's expected footprint.

> To improve this, we would want some thing that would try to check
> whether the caller is actually supposed to call the callee, which is
> more or less the hard part of CFI.  So can’t we just do CFI and call
> it a day?

By CFI you mean control flow integrity?  In theory I believe so, yes,
but in practice doesn't it require a lot of semantic object information
which is easy to get from higher level languages like java but a bit
more difficult for plain C.

> On top of that, a robust, maintainable implementation of this thing
> seems very complicated — for example, what happens if vfree() gets
> called?

Address space Local vs global object tracking is another thing on our
list.  What we'd probably do is verify the global object was allowed to
be freed and then hand it off safely to the main kernel address space.

James