Re: [RFC PATCH v9 12/13] xpfo, mm: Defer TLB flushes for non-current CPUs (x86 only)

Thomas Gleixner <tglx@xxxxxxxxxxxxx> · Fri, 5 Apr 2019 09:17:17 +0200 (CEST)

On Thu, 4 Apr 2019, Khalid Aziz wrote:
> When xpfo unmaps a page from physmap only (after mapping the page in
> userspace in response to an allocation request from userspace) on one
> processor, there is a small window of opportunity for ret2dir attack on
> other cpus until the TLB entry in physmap for the unmapped pages on
> other cpus is cleared. Forcing that to happen synchronously is the
> expensive part. A multiple of these requests can come in over a very
> short time across multiple processors resulting in every cpu asking
> every other cpusto flush TLB just to close this small window of
> vulnerability in the kernel. If each request is processed synchronously,
> each CPU will do multiple TLB flushes in short order. If we could
> consolidate these TLB flush requests instead and do one TLB flush on
> each cpu at the time of context switch, we can reduce the performance
> impact significantly. This bears out in real life measuring the system
> time when doing a parallel kernel build on a large server. Without this,
> system time on 96-core server when doing "make -j60 all" went up 26x.
> After this optimization, impact went down to 1.44x.
> 
> The trade-off with this strategy is, the kernel on a cpu is vulnerable
> for a short time if the current running processor is the malicious

The "short" time to next context switch on the other CPUs is how short
exactly? Anything from 1us to seconds- think NOHZ FULL - and even w/o that
10ms on a HZ=100 kernel is plenty of time to launch an attack.

> process. Is that an acceptable trade-off?

You are not seriously asking whether creating a user controllable ret2dir
attack window is a acceptable trade-off? April 1st was a few days ago.

Thanks,

	tglx