> On Apr 5, 2019, at 10:01 AM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > On 4/5/19 8:24 AM, Andy Lutomirski wrote: >>> Sounds like we need a mechanism that will do the deferred XPFO TLB >>> flushes whenever the kernel is entered, and not _just_ at context >>> switch time. This permits an app to run in userspace with stale >>> kernel TLB entries as long as it wants... that's harmless. > ... >> I suppose we could do the flush at context switch *and* >> entry. I bet that performance still utterly sucks, though — on many >> workloads, this turns every entry into a full flush, and we already >> know exactly how much that sucks — it’s identical to KPTI without >> PCID. (And yes, if we go this route, we need to merge this logic >> together — we shouldn’t write CR3 twice on entry). > > Yeah, probably true. > > Just eyeballing this, it would mean mapping the "cpu needs deferred > flush" variable into the cpu_entry_area, which doesn't seem too awful. > > I think the basic overall concern is that the deferred flush leaves too > many holes and by the time we close them sufficiently, performance will > suck again. Seems like a totally valid concern, but my crystal ball is > hazy on whether it will be worth it in the end to many folks > > ... >> In other words, I think that ret2dir is an insufficient justification >> for XPFO. > > Yeah, other things that it is good for have kinda been lost in the > noise. I think I first started looking at this long before Meltdown and > L1TF were public. > > There are hypervisors out there that simply don't (persistently) map > user data. They can't leak user data because they don't even have > access to it in their virtual address space. Those hypervisors had a > much easier time with L1TF mitigation than we did. Basically, they > could flush the L1 after user data was accessible instead of before > untrusted guest code runs. > > My hope is that XPFO could provide us similar protection. But, > somebody's got to poke at it for a while to see how far they can push it. > > IMNHO, XPFO is *always* going to be painful for kernel compiles. But, > cloud providers aren't doing a lot of kernel compiles on their KVM > hosts, and they deeply care about leaking their users' data. At the risk of asking stupid questions: we already have a mechanism for this: highmem. Can we enable highmem on x86_64, maybe with some heuristics to make it work well?