On Thu, Apr 25, 2019 at 05:30:13PM -0700, Andy Lutomirski wrote: > On Thu, Apr 25, 2019 at 2:46 PM Mike Rapoport <rppt@xxxxxxxxxxxxx> wrote: > > > > Hi, > > > > Address space isolation has been used to protect the kernel from the > > userspace and userspace programs from each other since the invention of the > > virtual memory. > > > > Assuming that kernel bugs and therefore vulnerabilities are inevitable it > > might be worth isolating parts of the kernel to minimize damage that these > > vulnerabilities can cause. > > > > The idea here is to allow an untrusted user access to a potentially > > vulnerable kernel in such a way that any kernel vulnerability they find to > > exploit is either prevented or the consequences confined to their isolated > > address space such that the compromise attempt has minimal impact on other > > tenants or the protected structures of the monolithic kernel. Although we > > hope to prevent many classes of attack, the first target we're looking at > > is ROP gadget protection. > > > > These patches implement a "system call isolation (SCI)" mechanism that > > allows running system calls in an isolated address space with reduced page > > tables to prevent ROP attacks. > > > > ROP attacks involve corrupting the stack return address to repoint it to a > > segment of code you know exists in the kernel that can be used to perform > > the action you need to exploit the system. > > > > The idea behind the prevention is that if we fault in pages in the > > execution path, we can compare target address against the kernel symbol > > table. So if we're in a function, we allow local jumps (and simply falling > > of the end of a page) but if we're jumping to a new function it must be to > > an external label in the symbol table. > > That's quite an assumption. The entry code at least uses .L labels. > Do you get that right? > > As far as I can see, most of what's going on here has very little to > do with jumps and calls. The benefit seems to come from making sure > that the RET instruction actually goes somewhere that's already been > faulted in. Am I understanding right? Well, RET indeed will go somewhere that's already been faulted in. But before that, the first CALL to not-yet-mapped code will fault and bring in the page containing the CALL target. If the CALL is made into a middle of a function, SCI will refuse to continue the syscall execution. As for the local jumps, as long as they are inside a page that was already mapped or the next page, they are allowed. This does not take care (yet) of larger functions where local jumps are further then PAGE_SIZE. Here's an example trace of #PF's produced by a dummy get_answer system call from patch 7: [ 12.012906] #PF: DATA: do_syscall_64+0x26b/0x4c0 fault at 0xffffffff82000bb8 [ 12.012918] #PF: INSN: __x86_indirect_thunk_rax+0x0/0x20 fault at __x86_indirect_thunk_rax+0x0/0x20 [ 12.012929] #PF: INSN: __x64_sys_get_answer+0x0/0x10 fault at __x64_sys_get_answer+0x0/0x10 > --Andy > -- Sincerely yours, Mike.