Re: [patch 5/9] x86: Cure per CPU madness on UP

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, Mar 04, 2024 at 11:12:23AM +0100, Thomas Gleixner wrote:
> On UP builds sparse complains rightfully about accesses to cpu_info with
> per CPU accessors:
> 
> cacheinfo.c:282:30: sparse: warning: incorrect type in initializer (different address spaces)
> cacheinfo.c:282:30: sparse:    expected void const [noderef] __percpu *__vpp_verify
> cacheinfo.c:282:30: sparse:    got unsigned int *
> 
> The reason is that on UP builds cpu_info which is a per CPU variable on SMP
> is mapped to boot_cpu_info which is a regular variable. There is a hideous
> accessor cpu_data() which tries to hide this, but it's not sufficient as
> some places require raw accessors and generates worse code than the regular
> per CPU accessors.
> 
> Waste sizeof(struct x86_cpuinfo) memory on UP and provide the per CPU
> cpu_info unconditionally. This requires to update the CPU info on the boot
> CPU as SMP does. (Ab)use the weakly defined smp_prepare_boot_cpu() function
> and implement exactly that.
> 
> This allows to use regular per CPU accessors uncoditionally and paves the
> way to remove the cpu_data() hackery.
> 
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>

This patch results in crashes when running the mainline kernel in qemu
with nosmp builds and Intel CPUs. The problem is _not_ seen on tag
x86-cleanups-2024-03-11; it is only seen in the mainline kernel. I didn't
check all of them, but it looks like AMD CPUs are not affected. The
initial bisect points to the merge of x86-cleanups-2024-03-11 into the
mainline kernel. I rebased x86-cleanups-2024-03-11 on top of the mainline
kernel; the second bisect uses that rebase as base. Reverting this patch
from the mainline kernel fixes the problem.

I don't know the code well enough to determine what is wrong.
Please let me know what I can do to help debugging the problem.

Thanks,
Guenter

----
crash log:

[    3.291087] BUG: unable to handle page fault for address: ffff9cd801f3f2a0
[    3.291087] #PF: supervisor write access in kernel mode
[    3.291087] #PF: error_code(0x0002) - not-present page
[    3.291087] PGD 60201067 P4D 60201067 PUD 0
[    3.291087] Oops: 0002 [#1] PREEMPT PTI
[    3.291087] CPU: 0 PID: 1 Comm: swapper Not tainted 6.8.0-06619-ge5e038b7ae9d #1
[    3.291087] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[    3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
[    3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff
[    3.291087] RSP: 0018:ffffa3d54001fdd0 EFLAGS: 00000246
[    3.291087] RAX: ffff9cd001f3f200 RBX: ffff9cd001fb34a8 RCX: 0000000000000000
[    3.291087] RDX: 00000000ffffffed RSI: 0000000000000001 RDI: ffff9cd001fb3550
[    3.291087] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[    3.291087] R10: 0000000000000001 R11: 0000000000018001 R12: 0000000000000000
[    3.291087] R13: 000000000000009e R14: ffffffffb6808180 R15: ffffffffb86710e5
[    3.291087] FS:  0000000000000000(0000) GS:ffffffffb8ab0000(0000) knlGS:0000000000000000
[    3.291087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.291087] CR2: ffff9cd801f3f2a0 CR3: 000000005e6a2000 CR4: 00000000001506f0
[    3.291087] Call Trace:
[    3.291087]  <TASK>
[    3.291087]  ? __die+0x1f/0x60
[    3.291087]  ? page_fault_oops+0x148/0x460
[    3.291087]  ? search_exception_tables+0x37/0x50
[    3.291087]  ? fixup_exception+0x21/0x320
[    3.291087]  ? exc_page_fault+0xca/0x150
[    3.291087]  ? asm_exc_page_fault+0x26/0x30
[    3.291087]  ? __pfx_rapl_cpu_online+0x10/0x10
[    3.291087]  ? rapl_cpu_online+0xf2/0x110
[    3.291087]  cpuhp_invoke_callback.constprop.0+0x117/0x3e0
[    3.291087]  __cpuhp_setup_state_cpuslocked+0x1b7/0x280
[    3.291087]  ? __pfx_rapl_cpu_online+0x10/0x10
[    3.291087]  rapl_pmu_init+0x189/0x2e0
[    3.291087]  ? __pfx_rapl_pmu_init+0x10/0x10
[    3.291087]  do_one_initcall+0x4f/0x210
[    3.291087]  kernel_init_freeable+0x166/0x290
[    3.291087]  ? __pfx_kernel_init+0x10/0x10
[    3.291087]  kernel_init+0x15/0x1b0
[    3.291087]  ret_from_fork+0x2f/0x50
[    3.291087]  ? __pfx_kernel_init+0x10/0x10
[    3.291087]  ret_from_fork_asm+0x19/0x30
[    3.291087]  </TASK>
[    3.291087] Modules linked in:
[    3.291087] CR2: ffff9cd801f3f2a0
[    3.291087] ---[ end trace 0000000000000000 ]---
[    3.291087] RIP: 0010:rapl_cpu_online+0xf2/0x110
[    3.291087] Code: 05 ff 8e 07 03 40 42 0f 00 48 89 43 60 e8 56 5f 12 00 8b 15 b4 84 61 02 48 8b 05 01 8f 07 03 48 c7 83 90 00 00 00 e0 84 80 b6 <48> 89 9c d0 38 01 00 00 e9 2b ff ff ff b8 f4 ff ff ff e9 47 ff ff
[    3.291087] RSP: 0018:ffffa3d54001fdd0 EFLAGS: 00000246
[    3.291087] RAX: ffff9cd001f3f200 RBX: ffff9cd001fb34a8 RCX: 0000000000000000
[    3.291087] RDX: 00000000ffffffed RSI: 0000000000000001 RDI: ffff9cd001fb3550
[    3.291087] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000001
[    3.291087] R10: 0000000000000001 R11: 0000000000018001 R12: 0000000000000000
[    3.291087] R13: 000000000000009e R14: ffffffffb6808180 R15: ffffffffb86710e5
[    3.291087] FS:  0000000000000000(0000) GS:ffffffffb8ab0000(0000) knlGS:0000000000000000
[    3.291087] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    3.291087] CR2: ffff9cd801f3f2a0 CR3: 000000005e6a2000 CR4: 00000000001506f0
[    3.291087] note: swapper[1] exited with irqs disabled
[    3.306047] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009

---
1st bisect:

# bad: [1bbeaf83dd7b5e3628b98bec66ff8fe2646e14aa] Merge tag 'perf-tools-for-v6.9-2024-03-13' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect start 'HEAD' 'v6.8'
# bad: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad 9187210eee7d87eea37b45ea93454a88681894a4
# bad: [a01c9fe32378636ae65bec8047b5de3fdb2ba5c8] Merge tag 'nfsd-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect bad a01c9fe32378636ae65bec8047b5de3fdb2ba5c8
# bad: [691632f0e86973604678d193ccfa47b9614581aa] Merge tag 's390-6.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect bad 691632f0e86973604678d193ccfa47b9614581aa
# good: [8ede842f669b6f78812349bbef4d1efd0fbdafce] Merge tag 'rust-6.9' of https://github.com/Rust-for-Linux/linux
git bisect good 8ede842f669b6f78812349bbef4d1efd0fbdafce
# good: [bfdb395a7cde12d83a623949ed029b0ab38d765b] Merge tag 'x86_mtrr_for_v6.9_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good bfdb395a7cde12d83a623949ed029b0ab38d765b
# bad: [685d98211273f60e38a6d361b62d7016c545297e] Merge tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 685d98211273f60e38a6d361b62d7016c545297e
# good: [b0402403e54ae9eb94ce1cbb53c7def776e97426] Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
git bisect good b0402403e54ae9eb94ce1cbb53c7def776e97426
# good: [cb81deefb59de01325ab822f900c13941bfaf67f] x86/idle: Sanitize X86_BUG_AMD_E400 handling
git bisect good cb81deefb59de01325ab822f900c13941bfaf67f
# good: [73f0d1d7b4abb4a46bae1a0d8caf66e23d1138d0] Merge tag 'x86-asm-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good 73f0d1d7b4abb4a46bae1a0d8caf66e23d1138d0
# good: [65efc4dc12c5cc296374278673b89390eba79fe6] x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation
git bisect good 65efc4dc12c5cc296374278673b89390eba79fe6
# good: [d69ad12c786f0a4593c48c0658043aa4a5116b09] Merge tag 'x86-build-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d69ad12c786f0a4593c48c0658043aa4a5116b09
# good: [35ce64922c8263448e58a2b9e8d15a64e11e9b2d] x86/idle: Select idle routine only once
git bisect good 35ce64922c8263448e58a2b9e8d15a64e11e9b2d
# good: [774a86f1c885460ade4334b901919fa1d8ae6ec6] x86/nmi: Drop unused declaration of proc_nmi_enabled()
git bisect good 774a86f1c885460ade4334b901919fa1d8ae6ec6
# bad: [fcc196579aa1fc167d6778948bff69fae6116737] Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad fcc196579aa1fc167d6778948bff69fae6116737
# first bad commit: [fcc196579aa1fc167d6778948bff69fae6116737] Merge tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

---
2nd bisect:

# bad: [6d70929c7019e265425f7a89cf72163a639d462e] x86/nmi: Drop unused declaration of proc_nmi_enabled()
# good: [d69ad12c786f0a4593c48c0658043aa4a5116b09] Merge tag 'x86-build-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect start 'HEAD' 'fcc196579aa1fc167d6778948bff69fae6116737~1'
# good: [5c157d25918ef15454c2a9a9262f7b763d9c9add] x86/msr: Add missing __percpu annotations
git bisect good 5c157d25918ef15454c2a9a9262f7b763d9c9add
# bad: [f5a6b1e2d92d825a0f73ca11e960795da336d99e] x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address()
git bisect bad f5a6b1e2d92d825a0f73ca11e960795da336d99e
# bad: [68907233f6d26a214bb79f892db8d999b15dda97] x86/percpu: Cure per CPU madness on UP
git bisect bad 68907233f6d26a214bb79f892db8d999b15dda97
# good: [053df18ceb928c8631042317a884b2842a457f1b] smp: Consolidate smp_prepare_boot_cpu()
git bisect good 053df18ceb928c8631042317a884b2842a457f1b
# first bad commit: [68907233f6d26a214bb79f892db8d999b15dda97] x86/percpu: Cure per CPU madness on UP




[Index of Archives]     [Newbies FAQ]     [LKML]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Trinity Fuzzer Tool]

  Powered by Linux