On Mon, 2024-09-23 at 19:45 -0700, Ricardo Neri wrote: > On Thu, Sep 19, 2024 at 01:19:27PM +0200, > gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > On Wed, Sep 18, 2024 at 06:54:33AM +0000, Zhang, Rui wrote: > > > On Mon, 2024-08-12 at 14:11 +0200, Greg KH wrote: > > > > On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis > > > > wrote: > > > > > [CCing the x86 folks, Greg, and the regressions list] > > > > > > > > > > Hi, Thorsten here, the Linux kernel's regression tracker. > > > > > > > > > > On 30.07.24 18:41, Thomas Lindroth wrote: > > > > > > I upgraded from kernel 6.1.94 to 6.1.99 on one of my > > > > > > machines and > > > > > > noticed that > > > > > > the dmesg line "Incomplete global flushes, disabling PCID" > > > > > > had > > > > > > disappeared from > > > > > > the log. > > > > > > > > > > Thomas, thx for the report. FWIW, mainline developers like > > > > > the x86 > > > > > folks > > > > > or Tony are free to focus on mainline and leave > > > > > stable/longterm > > > > > series > > > > > to other people -- some nevertheless help out regularly or > > > > > occasionally. > > > > > So with a bit of luck this mail will make one of them care > > > > > enough > > > > > to > > > > > provide a 6.1 version of what you afaics called the "existing > > > > > fix" > > > > > in > > > > > mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU > > > > > model > > > > > defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But > > > > > if > > > > > not I > > > > > suspect it might be up to you to prepare and submit a 6.1.y > > > > > variant > > > > > of > > > > > that fix, as you seem to care and are able to test the patch. > > > > > > > > Needs to go to 6.6.y first, right? But even then, it does not > > > > apply > > > > to > > > > 6.1.y cleanly, so someone needs to send a backported (and > > > > tested) > > > > series > > > > to us at stable@xxxxxxxxxxxxxxx and we will be glad to queue > > > > them up > > > > then. > > > > > > > > thanks, > > > > > > > > greg k-h > > > > > > There are three commits involved. > > > > > > commit A: > > > 4db64279bc2b (""x86/cpu: Switch to new Intel CPU model > > > defines"") > > > This commit replaces > > > X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ > > > with > > > X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ > > > This is a functional change because the family info is > > > replaced with > > > 0. And this exposes a x86_match_cpu() problem that it breaks when > > > the > > > vendor/family/model/stepping/feature fields are all zeros. > > > > > > commit B: > > > 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just > > > X86_VENDOR_INTEL") > > > It addresses the x86_match_cpu() problem by introducing a > > > valid flag > > > and set the flag in the Intel CPU model defines. > > > This fixes commit A, but it actually breaks the x86_cpu_id > > > structures that are constructed without using the Intel CPU model > > > defines, like arch/x86/mm/init.c. > > > > > > commit C: > > > 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines") > > > arch/x86/mm/init.c: broke by commit B but fixed by using the > > > new > > > Intel CPU model defines > > > > > > In 6.1.99, > > > commit A is missing > > > commit B is there > > > commit C is missing > > > > > > In 6.6.50, > > > commit A is missing > > > commit B is there > > > commit C is missing > > > > > > Now we can fix the problem in stable kernel, by converting > > > arch/x86/mm/init.c to use the CPU model defines (even the old > > > style > > > ones). But before that, I'm wondering if we need to backport > > > commit B > > > in 6.1 and 6.6 stable kernel because only commit A can expose > > > this > > > problem. > > > > If so, can you submit the needed backports for us to apply? That's > > the > > easiest way for us to take them, thanks. > > I audited all the uses of x86_match_cpu(match). All callers that > construct > the `match` argument using the family of X86_MATCH_* macros from > arch/x86/ > include/asm/cpu_device_id.h function correctly because the commit B > has > been backported to v6.1.99 and to v6.6.50 -- 93022482b294 ("x86/cpu: > Fix > x86_match_cpu() to match just X86_VENDOR_INTEL"). > > Only those callers that use their own thing to compose the `match` > argument > are buggy: > * arch/x86/mm/init.c > * drivers/powercap/intel_rapl_msr.c (only in 6.1.99) Thanks for auditing this. I overlooked the intel_rapl driver case. > > Summarizing, v6.1.99 needs these two commits from mainline > * d05b5e0baf42 ("powercap: RAPL: fix invalid initialization for > pl4_supported field") > * 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines") > > v6.6.50 only needs the second commit. Well, commit B 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") is backported to all stable kernels. And the above two broken cases are also there. So I suppose we need to backport all of them to 5.x stable kernel as well. thanks, rui > > I will submit these backports. > > Thanks and BR, > Ricardo