Re: [PATCH v4 2/5] arm64: errata: Assume that unknown CPUs _are_ vulnerable to Spectre BHB

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,


On Wed, Jan 29, 2025 at 8:43 AM James Morse <james.morse@xxxxxxx> wrote:
>
> Hi Doug,
>
> On 07/01/2025 20:05, Douglas Anderson wrote:
> > The code for detecting CPUs that are vulnerable to Spectre BHB was
> > based on a hardcoded list of CPU IDs that were known to be affected.
> > Unfortunately, the list mostly only contained the IDs of standard ARM
> > cores. The IDs for many cores that are minor variants of the standard
> > ARM cores (like many Qualcomm Kyro CPUs) weren't listed. This led the
> > code to assume that those variants were not affected.
> >
> > Flip the code on its head and instead assume that a core is vulnerable
> > if it doesn't have CSV2_3 but is unrecognized as being safe. This
> > involves creating a "Spectre BHB safe" list.
> >
> > As of right now, the only CPU IDs added to the "Spectre BHB safe" list
> > are ARM Cortex A35, A53, A55, A510, and A520. This list was created by
> > looking for cores that weren't listed in ARM's list [1] as per review
> > feedback on v2 of this patch [2]. Additionally Brahma A53 is added as
> > per mailing list feedback [3].
> >
> > NOTE: this patch will not actually _mitigate_ anyone, it will simply
> > cause them to report themselves as vulnerable. If any cores in the
> > system are reported as vulnerable but not mitigated then the whole
> > system will be reported as vulnerable though the system will attempt
> > to mitigate with the information it has about the known cores.
>
> > arch/arm64/include/asm/spectre.h |   1 -
> > arch/arm64/kernel/proton-pack.c  | 203 ++++++++++++++++---------------
> > 2 files changed, 102 insertions(+), 102 deletions(-)
>
> This is a pretty hefty diff-stat for adding a list of six CPUs. It looks like there are
> multiple things going on here: I think you're adding the 'safe' list of CPUs, then
> removing the list of firmware-mitigated list, then removing some indentation to do the
> mitigation detection differently. Any chance this can be split up?

I have to get back into the headspace of this patch since it's been a
few weeks, but let's see.

At the end, you suggest that maybe we could simplify this patch by
just adding the function is_spectre_bhb_safe(), calling it at the
beginning of is_spectre_bhb_affected(), and then changing
is_spectre_bhb_affected() to return "true" at the end instead of
false. Looking back at the patch, I believe this is correct.

...that being said, all of the other changes that are part of this
patch came in response to review feedback on earlier versions of the
patch and do make sense. Specifically:

1. Once the default value is to return false, there's not a reason to
call supports_clearbhb() and is_spectre_bhb_fw_affected() in
is_spectre_bhb_affected(), right? Those two functions _only_ job was
to be able to cause is_spectre_bhb_affected() to sometimes return
"true" instead of "false", but the function returns true by default
now so the calls have no purpose.

2. After we do #1, it doesn't really make sense to still have the
snippet of code that looks like this:

  if (spectre_bhb_loop_affected(scope))
    return true;

  return true;

We could just change that to:

  spectre_bhb_loop_affected(scope)
  return true;

...but then it seems really strange to call a function that returns a
value and we're ignoring it. As per review feedback on the previous
versions, I moved the calculation of "max_bhb_k" out of
spectre_bhb_loop_affected() and made that just for local scope and it
turned out much cleaner.

3. Given that we no longer have a reason to call
is_spectre_bhb_fw_affected() in is_spectre_bhb_affected(), it also
made sense to get rid of the "scope" variable for that function and
simplify it and the other caller.

...so I could break some of that stuff up into separate patches, but I
don't think we'd just want to drop any of the changes.


> (I'm not sure about the last chunk - it breaks automatic backporting)

Why would it break things? This entire series should just be
backported including the change of the default value to "affected". If
someone is running an old "stable" kernel and they're missing a
mitigation this would make it more obvious to them. Without
backporting everything then people running old kernels but lacking a
needed FW mitigation would show as "unaffected" when really they are
"vulnerable".


> > diff --git a/arch/arm64/kernel/proton-pack.c b/arch/arm64/kernel/proton-pack.c
> > index e149efadff20..17aa836fe46d 100644
> > --- a/arch/arm64/kernel/proton-pack.c
> > +++ b/arch/arm64/kernel/proton-pack.c
>
>
> > +static u8 spectre_bhb_loop_affected(void)
> >  {
> >       u8 k = 0;
> > -     static u8 max_bhb_k;
> > -
> > -     if (scope == SCOPE_LOCAL_CPU) {
> > -             static const struct midr_range spectre_bhb_k32_list[] = {
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
> > -                     MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> > -                     MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
> > -                     {},
> > -             };
> > -             static const struct midr_range spectre_bhb_k24_list[] = {
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
> > -                     MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
> > -                     MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD),
> > -                     {},
> > -             };
> > -             static const struct midr_range spectre_bhb_k11_list[] = {
> > -                     MIDR_ALL_VERSIONS(MIDR_AMPERE1),
> > -                     {},
> > -             };
> > -             static const struct midr_range spectre_bhb_k8_list[] = {
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
> > -                     MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
> > -                     {},
> > -             };
> > -
> > -             if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
> > -                     k = 32;
> > -             else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list))
> > -                     k = 24;
> > -             else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k11_list))
> > -                     k = 11;
> > -             else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k8_list))
> > -                     k =  8;
> > -
> > -             max_bhb_k = max(max_bhb_k, k);
> > -     } else {
> > -             k = max_bhb_k;
> > -     }
> > +
> > +     static const struct midr_range spectre_bhb_k32_list[] = {
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A78),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A78AE),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A78C),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_X1),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A710),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_X2),
> > +             MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N2),
> > +             MIDR_ALL_VERSIONS(MIDR_NEOVERSE_V1),
> > +             {},
> > +     };
> > +     static const struct midr_range spectre_bhb_k24_list[] = {
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A76),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A77),
> > +             MIDR_ALL_VERSIONS(MIDR_NEOVERSE_N1),
> > +             MIDR_ALL_VERSIONS(MIDR_QCOM_KRYO_4XX_GOLD),
> > +             {},
> > +     };
> > +     static const struct midr_range spectre_bhb_k11_list[] = {
> > +             MIDR_ALL_VERSIONS(MIDR_AMPERE1),
> > +             {},
> > +     };
> > +     static const struct midr_range spectre_bhb_k8_list[] = {
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A72),
> > +             MIDR_ALL_VERSIONS(MIDR_CORTEX_A57),
> > +             {},
> > +     };
> > +
> > +     if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k32_list))
> > +             k = 32;
> > +     else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k24_list))
> > +             k = 24;
> > +     else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k11_list))
> > +             k = 11;
> > +     else if (is_midr_in_range_list(read_cpuid_id(), spectre_bhb_k8_list))
> > +             k =  8;
> >
> >       return k;
> >  }
>
> This previous hunk reduces the indent to remove the static variable from inside the
> function. Your patch-1 can be picked up automatically by stable branches, but after this
> change, that will have to be done by hand.

As per above, I think the whole series should simply be backported so
this shouldn't be an issue.


> Arm have recently updated that table of CPUs
> with extra entries (thanks for picking those up!) - but now that patch can't be easily
> applied to older kernels.
> I suspect making the reporting assuming-vulnerable may make other CPUs come out of the
> wood work too...
>
> Could we avoid changing this unless we really need to?

Will / Catalin: Do either of you have an opinion here?


> As background, the idea is that CPUs that are newer than the kernel will discover the need
> for mitigation from firmware. That sucks for performance, but it can be improved once the
> kernel is updated to know about the 'local' workaround.
>
>
> > @@ -955,6 +956,8 @@ static bool supports_ecbhb(int scope)
> >                                                   ID_AA64MMFR1_EL1_ECBHB_SHIFT);
> >  }
> >
> > +static u8 max_bhb_k;
> > +
> >  bool is_spectre_bhb_affected(const struct arm64_cpu_capabilities *entry,
> >                            int scope)
> >  {
> > @@ -963,16 +966,18 @@ bool is_spectre_bhb_affected(const struct arm64_cpu_capabilities *entry,
> >       if (supports_csv2p3(scope))
> >               return false;
> >
> > -     if (supports_clearbhb(scope))
> > -             return true;
> > -
> > -     if (spectre_bhb_loop_affected(scope))
> > -             return true;
> > +     if (is_spectre_bhb_safe(scope))
> > +             return false;
> >
> > -     if (is_spectre_bhb_fw_affected(scope))
> > -             return true;
> > +     /*
> > +      * At this point the core isn't known to be "safe" so we're going to
> > +      * assume it's vulnerable. We still need to update `max_bhb_k` though,
> > +      * but only if we aren't mitigating with clearbhb though.
>
> +       or firmware.

No, I believe my comment is correct as-is. The code in
spectre_bhb_enable_mitigation() prior to my change prefers a loop
mitigation over a firmware mitigation. My patch doesn't change that
behavior. So we'll always update `max_bhb_k` even if there is a FW
mitigation that might be in place. Updating it by comparing it to a
value that might be 0 is still an "update" in my mind.

If a FW mitigation _is_ needed for a given core then
spectre_bhb_loop_affected() should return a "k" of 0. That will fall
through to the FW mitigation.

...so in the context of this code here, we always will "update"
`max_bhb_k` regardless of whether there is a firmware mitigation.


> > +      */
> > +     if (scope == SCOPE_LOCAL_CPU && !supports_clearbhb(SCOPE_LOCAL_CPU))
> > +             max_bhb_k = max(max_bhb_k, spectre_bhb_loop_affected());
>
> CPUs that need a firmware mitigation will get in here too. Its probably harmless as
> they'll report '0' as their k value... but you went to the trouble to weed out the CPUs
> that support the clearbhb instruction ...

In order to understand why the patch does what it does, I got all my
understanding of how mitigations should be applied by maintaining the
existing behavior of spectre_bhb_enable_mitigation(). Specifically, if
a FW mitigation is _needed_ for a given core then it's important that
spectre_bhb_loop_affected() return 0. This means that whoever wrote
that code assumes that the loop mitigation trumps the FW mitigation.

I don't think there's any other reason for weeding more CPUs out.


> For the change described in the commit message, isn't it enough to replace the final
> 'return false' with 'return !is_spectre_bhb_safe(scope)' ?

See my response to your first comment.


So I guess in summary I won't plan on sending a v5 at this time and
I'd hope that v4 can still land (maybe Qualcomm can give an Ack?). If
necessary I could break this patch into more than one to break the
diffstat into multiple patches, but I'd prefer to still keep the
resulting code the same. I like how it turned out and it matches all
the existing review feedback that Julius provided. If Will or Catalin
wants to come in or enough other people come in and vote that I should
make more changes, then I could do it.

I could move the last two patches up to the front and send a v5 if
needed. That would make the short term job of backporting those
faster, but any future cores would have the same problem. As per
above, I'm of the opinion that the whole series should go to stable
for as far back as people are willing to take them (which is why I
CCed stable on all the patches)


-Doug





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux