Re: [STABLE REGRESSION] Possible missing backport of x86_match_cpu() change in v6.1.96

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[CCing the x86 folks, Greg, and the regressions list]

Hi, Thorsten here, the Linux kernel's regression tracker.

On 30.07.24 18:41, Thomas Lindroth wrote:
> I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and
> noticed that
> the dmesg line "Incomplete global flushes, disabling PCID" had
> disappeared from
> the log.

Thomas, thx for the report. FWIW, mainline developers like the x86 folks
or Tony are free to focus on mainline and leave stable/longterm series
to other people -- some nevertheless help out regularly or occasionally.
So with a bit of luck this mail will make one of them care enough to
provide a 6.1 version of what you afaics called the "existing fix" in
mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model
defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I
suspect it might be up to you to prepare and submit a 6.1.y variant of
that fix, as you seem to care and are able to test the patch.

Ciao, Thorsten

> That message comes from commit c26b9e193172f48cd0ccc64285337106fb8aa804,
> which
> disables PCID support on some broken hardware in arch/x86/mm/init.c:
> 
> #define INTEL_MATCH(_model) { .vendor  = X86_VENDOR_INTEL,     \
>                              .family  = 6,                     \
>                              .model = _model,                  \
>                            }
> /*
>  * INVLPG may not properly flush Global entries
>  * on these CPUs when PCIDs are enabled.
>  */
> static const struct x86_cpu_id invlpg_miss_ids[] = {
>        INTEL_MATCH(INTEL_FAM6_ALDERLAKE   ),
>        INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
>        INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
>        INTEL_MATCH(INTEL_FAM6_RAPTORLAKE  ),
>        INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
>        INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
>        {}
> 
> ...
> 
> if (x86_match_cpu(invlpg_miss_ids)) {
>         pr_info("Incomplete global flushes, disabling PCID");
>         setup_clear_cpu_cap(X86_FEATURE_PCID);
>         return;
> }
> 
> arch/x86/mm/init.c, which has that code, hasn't changed in 6.1.94 ->
> 6.1.99.
> However I found a commit changing how x86_match_cpu() behaves in 6.1.96:
> 
> commit 8ab1361b2eae44077fef4adea16228d44ffb860c
> Author: Tony Luck <tony.luck@xxxxxxxxx>
> Date:   Mon May 20 15:45:33 2024 -0700
> 
>     x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
> 
> I suspect this broke the PCID disabling code in arch/x86/mm/init.c.
> The commit message says:
> 
> "Add a new flags field to struct x86_cpu_id that has a bit set to
> indicate that
> this entry in the array is valid. Update X86_MATCH*() macros to set that
> bit.
> Change the end-marker check in x86_match_cpu() to just check the flags
> field
> for this bit."
> 
> But the PCID disabling code in 6.1.99 does not make use of the
> X86_MATCH*() macros; instead, it defines a new INTEL_MATCH() macro
> without the
> X86_CPU_ID_FLAG_ENTRY_VALID flag.
> 
> I looked in upstream git and found an existing fix:
> commit 2eda374e883ad297bd9fe575a16c1dc850346075
> Author: Tony Luck <tony.luck@xxxxxxxxx>
> Date:   Wed Apr 24 11:15:18 2024 -0700
> 
>     x86/mm: Switch to new Intel CPU model defines
> 
>     New CPU #defines encode vendor and family as well as model.
> 
>     [ dhansen: vertically align 0's in invlpg_miss_ids[] ]
> 
>     Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>     Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
>     Signed-off-by: Borislav Petkov (AMD) <bp@xxxxxxxxx>
>     Link:
> https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
> 
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index 679893ea5e68..6b43b6480354 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void)
>         }
>  }
>  
> -#define INTEL_MATCH(_model) { .vendor  = X86_VENDOR_INTEL,     \
> -                             .family  = 6,                     \
> -                             .model = _model,                  \
> -                           }
>  /*
>   * INVLPG may not properly flush Global entries
>   * on these CPUs when PCIDs are enabled.
>   */
>  static const struct x86_cpu_id invlpg_miss_ids[] = {
> -       INTEL_MATCH(INTEL_FAM6_ALDERLAKE   ),
> -       INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> -       INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ),
> -       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE  ),
> -       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> -       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> +       X86_MATCH_VFM(INTEL_ALDERLAKE,      0),
> +       X86_MATCH_VFM(INTEL_ALDERLAKE_L,    0),
> +       X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0),
> +       X86_MATCH_VFM(INTEL_RAPTORLAKE,     0),
> +       X86_MATCH_VFM(INTEL_RAPTORLAKE_P,   0),
> +       X86_MATCH_VFM(INTEL_RAPTORLAKE_S,   0),
>         {}
>  };
> 
> The fix removed the custom INTEL_MATCH macro and uses the X86_MATCH*()
> macros
> with X86_CPU_ID_FLAG_ENTRY_VALID. This fixed commit was never backported
> to 6.1,
> so it looks like a stable series regression due to a missing backport.
> 
> If I apply the fix patch on 6.1.99, the PCID disabling code activates
> again.
> I had to change all the INTEL_* definitions to the old definitions to
> make it
> build:
> 
>  static const struct x86_cpu_id invlpg_miss_ids[] = {
> -       INTEL_MATCH(INTEL_FAM6_ALDERLAKE   ),
> -       INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ),
> -       INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ),
> -       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE  ),
> -       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P),
> -       INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S),
> +       X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE,    0),
> +       X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_L,  0),
> +       X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_N,  0),
> +       X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE,   0),
> +       X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_P, 0),
> +       X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_S, 0),
>         {}
>  };
> 
> I only looked at the code in arch/x86/mm/init.c, so there may be other
> uses of
> x86_match_cpu() in the kernel that are also broken in 6.1.99.
> This email is meant as a bug report, not a pull request. Someone else
> should
> confirm the problem and submit the appropriate fix.

P.S.:

#regzbot ^introduced 8ab1361b2eae44
#regzbot title x86:  Possible missing backport of x86_match_cpu() change
#regzbot ignore-activity




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux