> On Apr 26, 2019, at 11:49 AM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On Fri, 2019-04-26 at 10:40 -0700, Andy Lutomirski wrote: >>> On Apr 26, 2019, at 8:19 AM, James Bottomley <James.Bottomley@hanse >>> npartnership.com> wrote: >>> >>> On Fri, 2019-04-26 at 08:07 -0700, Andy Lutomirski wrote: >>>>> On Apr 26, 2019, at 7:57 AM, James Bottomley >>>>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: >>>>> >>>>>>> On Fri, 2019-04-26 at 07:46 -0700, Dave Hansen wrote: >>>>>>> On 4/25/19 2:45 PM, Mike Rapoport wrote: >>>>>>> After the isolated system call finishes, the mappings >>>>>>> created during its execution are cleared. >>>>>> >>>>>> Yikes. I guess that stops someone from calling write() a >>>>>> bunch of times on every filesystem using every block device >>>>>> driver and all the DM code to get a lot of code/data faulted >>>>>> in. But, it also means not even long-running processes will >>>>>> ever have a chance of behaving anything close to normally. >>>>>> >>>>>> Is this something you think can be rectified or is there >>>>>> something fundamental that would keep SCI page tables from >>>>>> being cached across different invocations of the same >>>>>> syscall? >>>>> >>>>> There is some work being done to look at pre-populating the >>>>> isolated address space with the expected execution footprint of >>>>> the system call, yes. It lessens the ROP gadget protection >>>>> slightly because you might find a gadget in the pre-populated >>>>> code, but it solves a lot of the overhead problem. >>>> >>>> I’m not even remotely a ROP expert, but: what stops a ROP payload >>>> from using all the “fault-in” gadgets that exist — any function >>>> that can return on an error without doing to much will fault in >>>> the whole page containing the function. >>> >>> The address space pre-population is still per syscall, so you don't >>> get access to the code footprint of a different syscall. So the >>> isolated address space is created anew for every system call, it's >>> just pre-populated with that system call's expected footprint. >> >> That’s not what I mean. Suppose I want to use a ROP gadget in >> vmalloc(), but vmalloc isn’t in the page tables. Then first push >> vmalloc itself into the stack. As long as RDI contains a sufficiently >> ridiculous value, it should just return without doing anything. And >> it can return right back into the ROP gadget, which is now available. > > Yes, it's not perfect, but stack space for a smashing attack is at a > premium and now you need two stack frames for every gadget you chain > instead of one so we've halved your ability to chain gadgets. > >>>> To improve this, we would want some thing that would try to check >>>> whether the caller is actually supposed to call the callee, which >>>> is more or less the hard part of CFI. So can’t we just do CFI >>>> and call it a day? >>> >>> By CFI you mean control flow integrity? In theory I believe so, >>> yes, but in practice doesn't it require a lot of semantic object >>> information which is easy to get from higher level languages like >>> java but a bit more difficult for plain C. >> >> Yes. As I understand it, grsecurity instruments gcc to create some >> kind of hash of all function signatures. Then any indirect call can >> effectively verify that it’s calling a function of the right type. >> And every return verified a cookie. >> >> On CET CPUs, RET gets checked directly, and I don’t see the benefit >> of SCI. > > Presumably you know something I don't but I thought CET CPUs had been > planned for release for ages, but not actually released yet? I don’t know any secrets about this, but I don’t think it’s released. Last I checked, it didn’t even have a final public spec. > >>>> On top of that, a robust, maintainable implementation of this >>>> thing seems very complicated — for example, what happens if >>>> vfree() gets called? >>> >>> Address space Local vs global object tracking is another thing on >>> our list. What we'd probably do is verify the global object was >>> allowed to be freed and then hand it off safely to the main kernel >>> address space. >> >> This seems exceedingly complicated. > > It's a research project: we're exploring what's possible so we can > choose the techniques that give the best security improvement for the > additional overhead. > :)