Re: [RFC PATCH v2 2/3] x86: cpu/bugs: add support for AMD ERAPS feature

"Shah, Amit" <Amit.Shah@xxxxxxx> · Wed, 13 Nov 2024 10:33:30 +0000

On Mon, 2024-11-11 at 10:57 -0800, Dave Hansen wrote:
> On 11/11/24 08:39, Amit Shah wrote:
> > Remove explicit RET stuffing / filling on VMEXITs and context
> > switches on AMD CPUs with the ERAPS feature (Zen5+).
> 
> I'm not a super big fan of how this changelog is constructed.  I
> personally really like the "background, problem, solution" form. 
> This
> starts with the solution before even telling us what it is.

The subject line is usually the one with the impact; but in this case I
couldn't fit the impact (add new feature + highlight we're getting rid
of existing mitigation mechanisms).  If you have ideas on a better
subject line, let me know.

> > With the Enhanced Return Address Prediction Security feature,  any
> > of
> > the following conditions triggers a flush of the RSB (aka RAP in
> > AMD
> > manuals) in hardware:
> > * context switch (e.g., move to CR3)
> 
> IMNHO, context switching is a broader topic that process-to-process
> switches.  A user=>kernel thread switch is definitely a context
> switch,
> but doesn't involve a CR3 write.  A user=>user switch in the same mm
> is
> also highly arguable to be a context switch and doesn't involve a CR3
> write.  There are userspace threading libraries that do "context
> switches".
> 
> This is further muddled by the very comments you are editing in this
> patch like: "unconditionally fill RSB on context switches".
> 
> Please be very, very precise in the language that's chosen here.

I'm not saying I disagree, and I'm also saying the same thing - context
switch is broader than just CR3 update.  But I take the point you're
really making - just limit this description to "CR3 update" -- because
that really is the case we're targeting with this feature in this patch
-- along with the arguments why the other cases (like user->kernel
switch or user->user switch in the same mm) are not affected by the
series.

> > * TLB flush
> > * some writes to CR4
> 
> ... and only one of those is relevant for this patch.  Aren't the
> others
> just noise?

I won't say so - for the purposes of this patch, yes.  But if some
other condition requires Linux to flush the RSB on a non-CR3-write
event in the future, they'll find this commit.  Leads to fewer "this is
unknown, let's assume the worst" conditions, somewhat like in the
comments in patch 1.

> > The feature also explicitly tags host and guest addresses in the
> > RSB -
> > eliminating the need for explicit flushing of the RSB on VMEXIT.
> > 
> > [RFC note: We'll wait for the APM to be updated with the real
> > wording,
> > but assuming it's going to say the ERAPS feature works the way
> > described
> > above, let's continue the discussion re: when the kernel currently
> > calls
> > FILL_RETURN_BUFFER, and what dropping it entirely means.
> > 
> > Dave Hansen pointed out __switch_to_asm fills the RSB each time
> > it's
> > called, so let's address the cases there:
> > 
> > 1. user->kernel switch: Switching from userspace to kernelspace,
> > and
> >    then using user-stuffed RSB entries in the kernel is not
> > possible
> >    thanks to SMEP.  We can safely drop the FILL_RETURN_BUFFER call
> > for
> >    this case.  In fact, this is how the original code was when
> > dwmw2
> >    added it originally in c995efd5a.  So while this case currently
> >    triggers an RSB flush (and will not after this ERAPS patch), the
> >    current flush isn't necessary for AMD systems with SMEP anyway.
> 
> This description makes me uneasy.  It basically boils down to, "SMEP
> guarantees that the kernel can't consume user-placed RSB entries."
> 
> It makes me uneasy because I recall that not holding true on some
> Intel
> CPUs. I believe some CPUs have a partial-width RSB.  Kernel
> consumption
> of a user-placed entry would induce the kernel to speculate to a
> *kernel* address so SMEP is rather ineffective.
> 
> So, instead of just saying, "SMEP magically fixes everything, trust
> us",
> could we please step through a few of the properties of the hardware
> and
> software that make SMEP effective?

That's fair.  And that's really the reason why I highlighted the first
line in the commit message: "we're really getting rid of software
mitigations here - focus on that please". Though I'll note here that
none of the questions raised in these series have come as a surprise to
me - I've had those exact questions asked of hardware engineers
internally, and answered to my satisfaction, and hence this patchset.

> First, I think that you're saying that AMD hardware has a full-width,
> precise RSB.  That ensures that speculation always happens back
> precisely to the original address, not some weird alias.  Second,
> ERAPS
> guarantees that the the address still has the same stuff mapped
> there.
> Any change to the address space involves either a CR3 write or a TLB
> flush, both of which would have flushed any user-placed content in
> the RSB.
> 
> 	Aside: Even the kernel-only text poking mm or EFI mm would
> "just
> 	work" in this scenario since they have their own mm_structs,
> 	page tables and root CR3 values.

All true.  David (Kaplan), please say if that's not right.

> > 2. user->user or kernel->kernel: If a user->user switch does not
> > result
> >    in a CR3 change, it's a different thread in the same process
> > context.
> >    That's the same case for kernel->kernel switch.  In this case,
> > the
> >    RSB entries are still valid in that context, just not the
> > correct
> >    ones in the new thread's context.  It's difficult to imagine
> > this
> >    being a security risk.  The current code clearing it, and this
> > patch
> >    not doing so for AMD-with-ERAPS, isn't a concern as far as I
> > see.
> > ]
> 
> The current rules are dirt simple: if the kernel stack gets switched,
> the RSB is flushed.  It's kinda hard to have a mismatched stack if
> it's
> never switched in the first place.  That makes the current rules dirt
> simple.  The problem (stack switching) is dirt simple to correlate
> 1:1
> with the fix (RSB stuffing).

... and it could be argued that's overkill.  These are security
mitigations -- and we're not mitigating security by clearing the RSB on
stack switches.  And perhaps we're inviting performance penalties by
doing more than what's required for security, and working against the
hardware rather than with it.

> This patch proposes separating the problem (stack switching) and the
> mitigations (implicit new microcode actions).  That's a hell of a lot
> more complicated and hardware to audit than the existing rules. 
> Agreed?

Partially.  We've got to trust the hardware and tools we've been given
- if the hardware says it behaves in a particular way on a given input
or event, we've got to trust that.  I'm building on top of what's been
given.

But: I like this thought experiment.  Let's continue.

> So, how are the rules relaxed?
> 
> First, it opens up the case where user threads consume RSB entries
> from
> other threads in the same process. Threads are usually at the same
> privilege level.  Instead of using a speculative attack, an attacker
> could just read the data directly.  The only place I can imagine this
> even remotely being a problem would be if threads were relying on
> protection keys to keep secrets from each other.

It's a valid scenario, albeit really badly designed security/crypto
software.  I would imagine that's difficult to exploit even then.

> The kernel consuming RSB entries from another kernel thread seems
> less
> clear.  I disagree with the idea that a valid entry in a given
> context
> is a _safe_ entry though.  Spectre v2 in _general_ involves nasty
> speculation to otherwise perfectly safe code. A problematic scenario
> would be where kswapd wakes up after waiting for I/O and starts
> speculating back along the return path of the userspace thread that
> kswapd interrupted. Userspace has some level of control over both
> stacks
> and it doesn't seem super far fetched to think that there could be
> some
> disclosure gadgets in there.
> 
> In short: the user-consumes-user-RSB-entry attacks seem fundamentally
> improbable because user threads are unlikely to have secrets from
> each
> other.
> 
> The kernel-consumes-kernel-RSB-entry attacks seem hard rather than
> fundamentally improbable.  I can't even count how many times our "oh
> that attack would be too hard to pull off" optimism has gone wrong.
> 
> What does that all _mean_?  ERAPS sucks?  Nah.  Maybe we just make
> sure
> that the existing spectre_v2=whatever controls can be used to stop
> relying on it if asked.

Hm, not yet convinced - but documenting this seems useful.  I can stick
this email verbatim in Documentation/ for the future, if you don't
mind?

Beyond that - there's the "mov cr3, cr3" we could plug in in
__switch_to_asm when ERAPS is active.  I don't like it - but at least
that's 1:1.

> > diff --git a/arch/x86/include/asm/nospec-branch.h
> > b/arch/x86/include/asm/nospec-branch.h
> > index 96b410b1d4e8..f5ee7fc71db5 100644
> > --- a/arch/x86/include/asm/nospec-branch.h
> > +++ b/arch/x86/include/asm/nospec-branch.h
> > @@ -117,6 +117,18 @@
> >   * We define a CPP macro such that it can be used from both .S
> > files and
> >   * inline assembly. It's possible to do a .macro and then include
> > that
> >   * from C via asm(".include <asm/nospec-branch.h>") but let's not
> > go there.
> > + *
> > + * AMD CPUs with the ERAPS feature may have a larger default RSB. 
> > These CPUs
> > + * use the default number of entries on a host, and can optionally
> > (based on
> > + * hypervisor setup) use 32 (old) or the new default in a guest. 
> > The number
> > + * of default entries is reflected in CPUID 8000_0021:EBX[23:16].
> > + *
> > + * With the ERAPS feature, RSB filling is not necessary anymore:
> > the RSB is
> > + * auto-cleared by hardware on context switches, TLB flushes, or
> > some CR4
> > + * writes.  Adapting the value of RSB_CLEAR_LOOPS below for ERAPS
> > would change
> > + * it to a runtime variable instead of the current compile-time
> > constant, so
> > + * leave it as-is, as this works for both older CPUs, as well as
> > newer ones
> > + * with ERAPS.
> >   */
> 
> This comment feels out of place and noisy to me.  Why the heck would
> we
> need to document the RSB size enumeration here and not use it?
> 
> Then this goes on to explain why this patch *didn't* bother to change
> RSB_CLEAR_LOOPS.  That seems much more like changelog material than a
> code comment to me.
> 
> To me, RSB_CLEAR_LOOPS, is for, well, clearing the RSB.  The existing
> comments say that some CPUs don't need to clear the RSB.  I don't
> think
> we need further explanation of why one specific CPU doesn't need to
> clear the RSB.  The new CPU isn't special.
> 
> Something slightly more useful would be to actually read CPUID
> 8000_0021:EBX[23:16] and compare it to RSB_CLEAR_LOOPS:
> 
> void amd_check_rsb_clearing(void)
> {
> 	/*
> 	 * Systems with both these features off
> 	 * do not perform RSB clearing:
> 	 */
> 	if (!boot_cpu_has(X86_FEATURE_RSB_CTXSW) &&
> 	    !boot_cpu_has(X86_FEATURE_RSB_VMEXIT))
> 		return;
> 
> 	// with actual helpers and mask generation, not this mess:
> 	rsb_size = (cpuid_ebx(0x80000021) >> 16) & 0xff;
> 
> 	WARN_ON_ONCE(rsb_size > RSB_CLEAR_LOOPS);
> }
> 
> That's almost shorter than the proposed comment.

That's a good idea.

The comment vs commit log is because the commit otherwise is in a
different file than the #define, and people reading the code can miss
it entirely.  I just prefer grep+read+git workflows much better for my
archaeology.

> >  #define RETPOLINE_THUNK_SIZE	32
> > diff --git a/arch/x86/kernel/cpu/bugs.c
> > b/arch/x86/kernel/cpu/bugs.c
> > index 0aa629b5537d..02446815b0de 100644
> > --- a/arch/x86/kernel/cpu/bugs.c
> > +++ b/arch/x86/kernel/cpu/bugs.c
> > @@ -1818,9 +1818,12 @@ static void __init
> > spectre_v2_select_mitigation(void)
> >  	pr_info("%s\n", spectre_v2_strings[mode]);
> >  
> >  	/*
> > -	 * If Spectre v2 protection has been enabled, fill the RSB
> > during a
> > -	 * context switch.  In general there are two types of RSB
> > attacks
> > -	 * across context switches, for which the CALLs/RETs may
> > be unbalanced.
> > +	 * If Spectre v2 protection has been enabled, the RSB
> > needs to be
> > +	 * cleared during a context switch.  Either do it in
> > software by
> > +	 * filling the RSB, or in hardware via ERAPS.
> 
> I'd move this comment about using ERAPS down to the ERAPS check
> itself.
> 
> > +	 * In general there are two types of RSB attacks across
> > context
> > +	 * switches, for which the CALLs/RETs may be unbalanced.
> >  	 *
> >  	 * 1) RSB underflow
> >  	 *
> > @@ -1848,12 +1851,21 @@ static void __init
> > spectre_v2_select_mitigation(void)
> >  	 *    RSB clearing.
> >  	 *
> >  	 * So to mitigate all cases, unconditionally fill RSB on
> > context
> > -	 * switches.
> > +	 * switches when ERAPS is not present.
> >  	 */
> > -	setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
> > -	pr_info("Spectre v2 / SpectreRSB mitigation: Filling RSB
> > on context switch\n");
> > +	if (!boot_cpu_has(X86_FEATURE_ERAPS)) {
> > +		setup_force_cpu_cap(X86_FEATURE_RSB_CTXSW);
> > +		pr_info("Spectre v2 / SpectreRSB mitigation:
> > Filling RSB on context switch\n");
> >  
> > -	spectre_v2_determine_rsb_fill_type_at_vmexit(mode);
> > +		/*
> > +		 * For guest -> host (or vice versa) RSB poisoning
> > scenarios,
> > +		 * determine the mitigation mode here.  With
> > ERAPS, RSB
> > +		 * entries are tagged as host or guest - ensuring
> > that neither
> > +		 * the host nor the guest have to clear or fill
> > RSB entries to
> > +		 * avoid poisoning: skip RSB filling at VMEXIT in
> > that case.
> > +		 */
> > +		spectre_v2_determine_rsb_fill_type_at_vmexit(mode)
> > ;
> > +	}
> 
> This seems to be following a pattern of putting excessive amounts of
> very AMD-specific commenting right in the middle of common code.
> There's a *HUGE* comment in
> spectre_v2_determine_rsb_fill_type_at_vmexit().  Wouldn't this be
> more
> at home there?

I think with Josh's reworded comments and stripping out of comments
from spectre_v2_determine_...() flows much better - so yea, will
update.

Thank you for the thorough review, David!

		Amit