Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Fri, 5 Apr 2019 10:27:05 -0600

> On Apr 5, 2019, at 10:01 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
> 
> On 4/5/19 8:24 AM, Andy Lutomirski wrote:
>>> Sounds like we need a mechanism that will do the deferred XPFO TLB 
>>> flushes whenever the kernel is entered, and not _just_ at context
>>> switch time.  This permits an app to run in userspace with stale
>>> kernel TLB entries as long as it wants... that's harmless.
> ...
>> I suppose we could do the flush at context switch *and*
>> entry.  I bet that performance still utterly sucks, though — on many
>> workloads, this turns every entry into a full flush, and we already
>> know exactly how much that sucks — it’s identical to KPTI without
>> PCID.  (And yes, if we go this route, we need to merge this logic
>> together — we shouldn’t write CR3 twice on entry).
> 
> Yeah, probably true.
> 
> Just eyeballing this, it would mean mapping the "cpu needs deferred
> flush" variable into the cpu_entry_area, which doesn't seem too awful.
> 
> I think the basic overall concern is that the deferred flush leaves too
> many holes and by the time we close them sufficiently, performance will
> suck again.  Seems like a totally valid concern, but my crystal ball is
> hazy on whether it will be worth it in the end to many folks
> 
> ...
>> In other words, I think that ret2dir is an insufficient justification
>> for XPFO.
> 
> Yeah, other things that it is good for have kinda been lost in the
> noise.  I think I first started looking at this long before Meltdown and
> L1TF were public.
> 
> There are hypervisors out there that simply don't (persistently) map
> user data.  They can't leak user data because they don't even have
> access to it in their virtual address space.  Those hypervisors had a
> much easier time with L1TF mitigation than we did.  Basically, they
> could flush the L1 after user data was accessible instead of before
> untrusted guest code runs.
> 
> My hope is that XPFO could provide us similar protection.  But,
> somebody's got to poke at it for a while to see how far they can push it.
> 
> IMNHO, XPFO is *always* going to be painful for kernel compiles.  But,
> cloud providers aren't doing a lot of kernel compiles on their KVM
> hosts, and they deeply care about leaking their users' data.

At the risk of asking stupid questions: we already have a mechanism for this: highmem.  Can we enable highmem on x86_64, maybe with some heuristics to make it work well?