Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Fri, 5 Apr 2019 10:32:17 -0600

> On Apr 5, 2019, at 9:56 AM, Tycho Andersen <tycho@xxxxxxxx> wrote:
> 
>> On Fri, Apr 05, 2019 at 09:24:50AM -0600, Andy Lutomirski wrote:
>> 
>> 
>>> On Apr 5, 2019, at 8:44 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>>> 
>>> On 4/5/19 12:17 AM, Thomas Gleixner wrote:
>>>>> process. Is that an acceptable trade-off?
>>>> You are not seriously asking whether creating a user controllable ret2dir
>>>> attack window is a acceptable trade-off? April 1st was a few days ago.
>>> 
>>> Well, let's not forget that this set at least takes us from "always
>>> vulnerable to ret2dir" to a choice between:
>>> 
>>> 1. fast-ish and "vulnerable to ret2dir for a user-controllable window"
>>> 2. slow and "mitigated against ret2dir"
>>> 
>>> Sounds like we need a mechanism that will do the deferred XPFO TLB
>>> flushes whenever the kernel is entered, and not _just_ at context switch
>>> time.  This permits an app to run in userspace with stale kernel TLB
>>> entries as long as it wants... that's harmless.
>> 
>> I don’t think this is good enough. The bad guys can enter the kernel and arrange for the kernel to wait, *in kernel*, for long enough to set up the attack.  userfaultfd is the most obvious way, but there are plenty. I suppose we could do the flush at context switch *and* entry.  I bet that performance still utterly sucks, though — on many workloads, this turns every entry into a full flush, and we already know exactly how much that sucks — it’s identical to KPTI without PCID.  (And yes, if we go this route, we need to merge this logic together — we shouldn’t write CR3 twice on entry).
>> 
>> I feel like this whole approach is misguided. ret2dir is not such a game changer that fixing it is worth huge slowdowns. I think all this effort should be spent on some kind of sensible CFI. For example, we should be able to mostly squash ret2anything by inserting a check that the high bits of RSP match the value on the top of the stack before any code that pops RSP.  On an FPO build, there aren’t all that many hot POP RSP instructions, I think.
>> 
>> (Actually, checking the bits is suboptimal. Do:
>> 
>> unsigned long offset = *rsp - rsp;
>> offset >>= THREAD_SHIFT;
>> if (unlikely(offset))
>> BUG();
>> POP RSP;
> 
> This is a neat trick, and definitely prevents going random places in
> the heap. But,
> 
>> This means that it’s also impossible to trick a function to return into a buffer that is on that function’s stack.)
> 
> Why is this true? All you're checking is that you can't shift the
> "location" of the stack. If you can inject stuff into a stack buffer,
> can't you just inject the right frame to return to your code as well,
> so you don't have to shift locations?
> 
> 

But the injected ROP payload will be *below* RSP, so you’ll need a gadget that can decrement RSP.  This makes the attack a good deal harder.

Something like RAP on top, or CET, will make this even harder.

> 
> Tycho