Re: [RFC PATCH v2 1/3] x86: cpu/bugs: update SpectreRSB comments for AMD

Andrew Cooper <andrew.cooper3@xxxxxxxxxx> · Tue, 12 Nov 2024 00:29:28 +0000

On 11/11/2024 7:33 pm, Josh Poimboeuf wrote:
> On Mon, Nov 11, 2024 at 05:39:11PM +0100, Amit Shah wrote:
>> From: Amit Shah <amit.shah@xxxxxxx>
>>
>> AMD CPUs do not fall back to the BTB when the RSB underflows for RET
>> address speculation.  AMD CPUs have not needed to stuff the RSB for
>> underflow conditions.
>>
>> The RSB poisoning case is addressed by RSB filling - clean up the FIXME
>> comment about it.
> I'm thinking the comments need more clarification in light of BTC and
> SRSO.
>
> This:
>
>> -	 *    AMD has it even worse: *all* returns are speculated from the BTB,
>> -	 *    regardless of the state of the RSB.
> is still true (mostly: "all" should be "some"), though it doesn't belong
> in the "RSB underflow" section.
>
> Also the RSB stuffing not only mitigates RET, it mitigates any other
> instruction which happen to be predicted as a RET.  Which is presumably
> why it's still needed even when SRSO is enabled.
>
> Something like below?
>
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 47a01d4028f6..e95d3aa14259 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -1828,9 +1828,6 @@ static void __init spectre_v2_select_mitigation(void)
>  	 *    speculated return targets may come from the branch predictor,
>  	 *    which could have a user-poisoned BTB or BHB entry.
>  	 *
> -	 *    AMD has it even worse: *all* returns are speculated from the BTB,
> -	 *    regardless of the state of the RSB.
> -	 *
>  	 *    When IBRS or eIBRS is enabled, the "user -> kernel" attack
>  	 *    scenario is mitigated by the IBRS branch prediction isolation
>  	 *    properties, so the RSB buffer filling wouldn't be necessary to
> @@ -1850,10 +1847,22 @@ static void __init spectre_v2_select_mitigation(void)
>  	 *    The "user -> user" scenario, also known as SpectreBHB, requires
>  	 *    RSB clearing.
>  	 *
> +	 *    AMD Branch Type Confusion (aka "AMD retbleed") adds some
> +	 *    additional wrinkles:
> +	 *
> +	 *      - A RET can be mispredicted as a direct or indirect branch,
> +	 *        causing the CPU to speculatively branch to a BTB target, in
> +	 *        which case the RSB filling obviously doesn't help.  That case
> +	 *        is mitigated by removing all the RETs (SRSO mitigation).
> +	 *
> +	 *      - The RSB is not only used for architectural RET instructions,
> +	 *        it may also be used for other instructions which happen to
> +	 *        get mispredicted as RETs.  Therefore RSB filling is still
> +	 *        needed even when the RETs have all been removed by the SRSO
> +	 *        mitigation.

This is my take.  On AMD CPUs, there are two unrelated issues to take
into account:

1) SRSO

Affects anything which doesn't enumerate SRSO_NO, which is all parts to
date including Zen5.

SRSO ends up overflowing the RAS with arbitrary BTB targets, such that a
subsequent genuine RET follows a prediction which never came from a real
CALL instruction.

Mitigations for SRSO are either safe-ret, or IBPB-on-entry.  Parts
without IBPB_RET using IBPB-on-entry need to manually flush the RAS.

Importantly, SMEP does not protection you against SRSO across the
user->kernel boundary, because the bad RAS entries are arbitrary.  New
in Zen5 is the SRSO_U/S_NO bit which says this case can't occur any
more.  So on Zen5, you can in principle get away without a RAS flush on
entry.

2) BTC

Affects anything which doesn't enumerate BTC_NO, which is Zen2 and older
(Fam17h for AMD, Fam18h for Hygon).

Attacker can forge any branch type prediction, and the most dangerous
one is RET-mispredicted-as-INDIRECT.  This causes a genuine RET
instruction to follow a prediction that was believed to be an indirect
branch.

All CPUs which suffer BTC also suffer SRSO, so while jmp2ret is a
mitigation for BTC, it's utility became 0 when SRSO was discovered. 
(Which as shame, because it's equal parts beautiful and terrifying.) 
Mitigations for BTC are therefore safe-ret or IBPB-on-entry.

Flushing the RAS has no effect on BTC, because the whole problem with
BTC is that the prediction comes from the "wrong" predictor, but you
need to do it for other reasons.

~Andrew