> On Apr 26, 2019, at 8:19 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 2019-04-26 at 08:07 -0700, Andy Lutomirski wrote: >>> On Apr 26, 2019, at 7:57 AM, James Bottomley <James.Bottomley@hanse >>> npartnership.com> wrote: >>> >>>>> On Fri, 2019-04-26 at 07:46 -0700, Dave Hansen wrote: >>>>> On 4/25/19 2:45 PM, Mike Rapoport wrote: >>>>> After the isolated system call finishes, the mappings created >>>>> during its execution are cleared. >>>> >>>> Yikes. I guess that stops someone from calling write() a bunch >>>> of times on every filesystem using every block device driver and >>>> all the DM code to get a lot of code/data faulted in. But, it >>>> also means not even long-running processes will ever have a >>>> chance of behaving anything close to normally. >>>> >>>> Is this something you think can be rectified or is there >>>> something fundamental that would keep SCI page tables from being >>>> cached across different invocations of the same syscall? >>> >>> There is some work being done to look at pre-populating the >>> isolated address space with the expected execution footprint of the >>> system call, yes. It lessens the ROP gadget protection slightly >>> because you might find a gadget in the pre-populated code, but it >>> solves a lot of the overhead problem. >> >> I’m not even remotely a ROP expert, but: what stops a ROP payload >> from using all the “fault-in” gadgets that exist — any function that >> can return on an error without doing to much will fault in the whole >> page containing the function. > > The address space pre-population is still per syscall, so you don't get > access to the code footprint of a different syscall. So the isolated > address space is created anew for every system call, it's just pre- > populated with that system call's expected footprint. That’s not what I mean. Suppose I want to use a ROP gadget in vmalloc(), but vmalloc isn’t in the page tables. Then first push vmalloc itself into the stack. As long as RDI contains a sufficiently ridiculous value, it should just return without doing anything. And it can return right back into the ROP gadget, which is now available. > >> To improve this, we would want some thing that would try to check >> whether the caller is actually supposed to call the callee, which is >> more or less the hard part of CFI. So can’t we just do CFI and call >> it a day? > > By CFI you mean control flow integrity? In theory I believe so, yes, > but in practice doesn't it require a lot of semantic object information > which is easy to get from higher level languages like java but a bit > more difficult for plain C. Yes. As I understand it, grsecurity instruments gcc to create some kind of hash of all function signatures. Then any indirect call can effectively verify that it’s calling a function of the right type. And every return verified a cookie. On CET CPUs, RET gets checked directly, and I don’t see the benefit of SCI. > >> On top of that, a robust, maintainable implementation of this thing >> seems very complicated — for example, what happens if vfree() gets >> called? > > Address space Local vs global object tracking is another thing on our > list. What we'd probably do is verify the global object was allowed to > be freed and then hand it off safely to the main kernel address space. > > This seems exceedingly complicated.