Re: [bisected] clang 16 built kernel crashes w. "BUG: kernel NULL pointer dereference, address: 00000007", gcc 13 built kernel with same config boots fine (6.7-rc1, x86_32)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 15 Nov 2023 09:33:17 -0800
Roman Gushchin <roman.gushchin@xxxxxxxxx> wrote:

> Hm, interesting, so the issue is happening only with a kernel built with clang-16
> but not gcc? And you use 32-bit kernel? Do you know if it's reproducible on a
> 64-bit machine?

Correct. This only happens when I build the kernel with clang-16. A gcc-13 kernel build using the same .config is fine. That's why I reported it first on https://github.com/ClangBuiltLinux/linux/issues/1959

Surprisingly I was indeed able to reproduce the issue on my amd64 box! Here also the gcc-13 build is fine and the clang-16 build crashes: 

[...]
KASAN: maybe wild-memory-access in range [0xaaaaaaaaaaaaaab8-0xaaaaaaaaaaaaaabf]
CPU: 26 PID: 1 Comm: systemd Not tainted 6.7.0-rc1-Zen3 #1
Hardware name: To Be Filled By O.E.M. B450M Steel Legend/B450M Steel Legend, BIOS P8.01 03/14/2023
RIP: 0010:obj_cgroup_charge_pages+0x27/0x2d5
Code: 90 90 90 55 41 57 41 56 41 55 41 54 53 89 d5 41 89 f6 49 89 ff 48 b8 00 00 00 00 00 fc ff df 49 83 c7 10 4d 89 fd 49 c1 ed 03 <41> 80 7c 05 00 00 74 08 4c 89 ff e8 5e 3a fd ff 49 8b 1f 4c 8d 63
RSP: 0018:ffffc90000067a78 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: aaaaaaaaaaaaaaaa RCX: ffff8887df328b08
RDX: 000000000000000a RSI: 0000000000400cc0 RDI: aaaaaaaaaaaaaaaa
RBP: 000000000000000a R08: 3333333333333333 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887df328b18
R13: 1555555555555557 R14: 0000000000400cc0 R15: aaaaaaaaaaaaaaba
FS:  00007fd18c5cb8c0(0000) GS:ffff8887df300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005614629e5098 CR3: 0000000108066000 CR4: 0000000000b50ef0
Call Trace:
 <TASK>
 ? __die_body+0x16/0x75
 ? die_addr+0x4a/0x70
 ? exc_general_protection+0x1c9/0x2d0
 ? cgroup_mkdir+0x455/0x9fb
 ? __x64_sys_mkdir+0x69/0x80
 ? asm_exc_general_protection+0x26/0x30
 ? obj_cgroup_charge_pages+0x27/0x2d5
 obj_cgroup_charge+0x114/0x1ab
 pcpu_alloc+0x1a6/0xa65
 ? mem_cgroup_css_alloc+0x1eb/0x1140
 ? cgroup_apply_control_enable+0x26b/0x7c0
 mem_cgroup_css_alloc+0x23f/0x1140
 cgroup_apply_control_enable+0x26b/0x7c0
 ? cgroup_kn_set_ugid+0x2d/0x1a0
 ? srso_alias_return_thunk+0x5/0xfbef5
 cgroup_mkdir+0x455/0x9fb
 ? __cfi_cgroup_mkdir+0x10/0x10
 kernfs_iop_mkdir+0x130/0x170
 vfs_mkdir+0x405/0x530
 do_mkdirat+0x188/0x1f0
 __x64_sys_mkdir+0x69/0x80
 do_syscall_64+0x7d/0x100
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? syscall_exit_to_user_mode+0x23/0xc0
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? do_syscall_64+0x89/0x100
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? do_syscall_64+0x89/0x100
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? do_syscall_64+0x89/0x100
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? do_syscall_64+0x89/0x100
 entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fd18c7216e7
Code: 00 66 90 48 89 f2 b9 00 01 00 00 48 89 fe bf 9c ff ff ff e9 1b cc ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 b8 53 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 01 c3 48 8b 15 19 47 0d 00 f7 d8 64 89 02 b8
RSP: 002b:00007ffd5d347128 EFLAGS: 00000246 ORIG_RAX: 0000000000000053
RAX: ffffffffffffffda RBX: 00005614628edf30 RCX: 00007fd18c7216e7
RDX: 0000000000000000 RSI: 00000000000001ed RDI: 00005614628fbd80
RBP: 00007ffd5d347170 R08: 000000000000000e R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fd18c8ce39a
R13: 00007ffd5d347140 R14: 00000000000000a0 R15: 00005614628c9560
 </TASK>
Modules linked in: efivarfs dmi_sysfs
---[ end trace 0000000000000000 ]---
RIP: 0010:obj_cgroup_charge_pages+0x27/0x2d5
Code: 90 90 90 55 41 57 41 56 41 55 41 54 53 89 d5 41 89 f6 49 89 ff 48 b8 00 00 00 00 00 fc ff df 49 83 c7 10 4d 89 fd 49 c1 ed 03 <41> 80 7c 05 00 00 74 08 4c 89 ff e8 5e 3a fd ff 49 8b 1f 4c 8d 63
RSP: 0018:ffffc90000067a78 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: aaaaaaaaaaaaaaaa RCX: ffff8887df328b08
RDX: 000000000000000a RSI: 0000000000400cc0 RDI: aaaaaaaaaaaaaaaa
RBP: 000000000000000a R08: 3333333333333333 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8887df328b18
R13: 1555555555555557 R14: 0000000000400cc0 R15: aaaaaaaaaaaaaaba
FS:  00007fd18c5cb8c0(0000) GS:ffff8887df300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005614629e5098 CR3: 0000000108066000 CR4: 0000000000b50ef0
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
Kernel Offset: 0x37000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Rebooting in 40 seconds..


Though the trace looks a bit different compared to my 32bit Thinkpad T60 it should be the same issue as reverting your patchset 'fixes' the clang-16 built kernel and the machine boots up ok.

> Completely speculative, but can you please check if the following patch
> resolves the problem?
>
> --
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 10917c3e1f03..a0df246e81f0 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1186,6 +1186,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
>  #ifdef CONFIG_MEMCG
>         tsk->active_memcg = NULL;
>  #endif
> +#ifdef CONFIG_MEMCG_KMEM
> +       tsk->objcg = NULL;
> +#endif
> 
>  #ifdef CONFIG_CPU_SUP_INTEL
>         tsk->reported_split_lock = 0;

Thanks for looking into this! But the patch did not work out unfortunately. Though only tried on my T60 so far and not on my amd64 box.

Also some data about my amd64 box:
 # inxi -bz
System:
  Kernel: 6.7.0-rc1-Zen3-dirty arch: x86_64 bits: 64 Console: pty pts/0
    Distro: Gentoo Base System release 2.14
Machine:
  Type: Desktop Mobo: ASRock model: B450M Steel Legend serial: <filter>
    UEFI: American Megatrends v: P8.01 date: 03/14/2023
CPU:
  Info: 16-core AMD Ryzen 9 5950X [MT MCP] speed (MHz): avg: 682
    min/max: 550/5084
Graphics:
  Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
    driver: amdgpu v: kernel
  Device-2: AMD RV516 [Radeon X1300/X1550 Series] driver: radeon v: kernel
  Display: x11 server: X.org v: 1.21.1.9 driver: X: loaded: amdgpu
    unloaded: fbdev,modesetting,radeon dri: radeonsi gpu: amdgpu,radeon
    resolution: <missing: xdpyinfo/xrandr> resolution: 1: 3840x2160
    2: 3840x2160
  API: OpenGL v: 4.5 Mesa 23.1.8 renderer: llvmpipe (LLVM 16.0.6 256 bits)
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    driver: r8169


Full dmesg attached (1. without KASAN 2. with KASAN), amd64 kernel .config attached.

Regards,
Erhard

Attachment: dmesg_67-rc1_zen3_01
Description: Binary data

Attachment: dmesg_67-rc1_zen3_02
Description: Binary data

Attachment: config_67-rc1_zen3
Description: Binary data


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux