Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 04, 2023 at 11:34:27PM +0200, Holger Hoffstätte wrote:
> I applied the fix and did a clean rebuild. The first attempt to boot resulted in
> the following oops, though it kind of continued:

It would be helpful to run this through decode_stacktrace.sh

> Jul  4 22:35:22 hho kernel: BUG: kernel NULL pointer dereference, address: 0000000000000052
> Jul  4 22:35:22 hho kernel: #PF: supervisor read access in kernel mode
> Jul  4 22:35:22 hho kernel: #PF: error_code(0x0000) - not-present page
> Jul  4 22:35:22 hho kernel: PGD 0 P4D 0
> Jul  4 22:35:22 hho kernel: Oops: 0000 [#1] SMP
> Jul  4 22:35:22 hho kernel: CPU: 10 PID: 1740 Comm: start-stop-daem Not tainted 6.4.1 #1
> Jul  4 22:35:22 hho kernel: Hardware name: LENOVO 20U50001GE/20U50001GE, BIOS R19ET32W (1.16 ) 01/26/2021
> Jul  4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0
> Jul  4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b

Faulting insn:

   0:	4c 8b 70 48          	mov    0x48(%rax),%r14

and rax is 0xa, which matches up with 0x52 as the faulting address.

I'm not sure this is related to the VMA patches.  It might be something
unrelated that doesn't often come up?

> Jul  4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> Jul  4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> Jul  4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> Jul  4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> Jul  4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> Jul  4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul  4 22:35:22 hho kernel: FS:  00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> Jul  4 22:35:22 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> Jul  4 22:35:22 hho kernel: Call Trace:
> Jul  4 22:35:22 hho kernel:  <TASK>
> Jul  4 22:35:22 hho kernel:  ? __die+0x1f/0x60
> Jul  4 22:35:22 hho kernel:  ? page_fault_oops+0x14d/0x410
> Jul  4 22:35:22 hho kernel:  ? xa_load+0x82/0xa0
> Jul  4 22:35:22 hho kernel:  ? exc_page_fault+0x60/0x100
> Jul  4 22:35:22 hho kernel:  ? asm_exc_page_fault+0x22/0x30
> Jul  4 22:35:22 hho kernel:  ? wq_worker_comm+0x63/0xc0
> Jul  4 22:35:22 hho last message buffered 1 times
> Jul  4 22:35:22 hho kernel:  proc_task_name+0xa4/0xb0
> Jul  4 22:35:22 hho kernel:  ? seq_put_decimal_ull_width+0x96/0x100
> Jul  4 22:35:22 hho kernel:  do_task_stat+0x44b/0xe10
> Jul  4 22:35:22 hho kernel:  proc_single_show+0x4b/0xa0
> Jul  4 22:35:22 hho kernel:  seq_read_iter+0xff/0x410
> Jul  4 22:35:22 hho kernel:  ? generic_fillattr+0x45/0xf0
> Jul  4 22:35:22 hho kernel:  seq_read+0x93/0xb0
> Jul  4 22:35:22 hho kernel:  vfs_read+0x9b/0x2c0
> Jul  4 22:35:22 hho kernel:  ? __do_sys_newfstatat+0x22/0x30
> Jul  4 22:35:22 hho kernel:  ksys_read+0x53/0xc0
> Jul  4 22:35:22 hho kernel:  do_syscall_64+0x35/0x80
> Jul  4 22:35:22 hho kernel:  entry_SYSCALL_64_after_hwframe+0x46/0xb0
> Jul  4 22:35:22 hho kernel: RIP: 0033:0x7f39ddf5877d
> Jul  4 22:35:22 hho kernel: Code: b9 fe ff ff 48 8d 3d 1a 71 0a 00 50 e8 2c 12 02 00 66 2e 0f 1f 84 00 00 00 00 00 66 90 80 3d 81 4c 0e 00 00 74 17 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 5b c3 66 2e 0f 1f 84 00 00 00 00 00 53 48 83
> Jul  4 22:35:22 hho kernel: RSP: 002b:00007ffe4b98b6f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
> Jul  4 22:35:22 hho kernel: RAX: ffffffffffffffda RBX: 00005655194cab40 RCX: 00007f39ddf5877d
> Jul  4 22:35:22 hho kernel: RDX: 0000000000000400 RSI: 00005655194ccd30 RDI: 0000000000000004
> Jul  4 22:35:22 hho kernel: RBP: 00007ffe4b98b760 R08: 00007f39ddff8cb2 R09: 0000000000000001
> Jul  4 22:35:22 hho kernel: R10: 0000000000001000 R11: 0000000000000246 R12: 00007f39de0324a0
> Jul  4 22:35:22 hho kernel: R13: 00005655194cd140 R14: 0000000000000a68 R15: 00007f39de031ba0
> Jul  4 22:35:22 hho kernel:  </TASK>
> Jul  4 22:35:22 hho kernel: Modules linked in: mousedev sch_fq_codel bpf_preload snd_ctl_led amdgpu iwlmvm snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi mac80211 pkcs8_key_parser drm_ttm_helper ttm iommu_v2 gpu_sched snd_hda_intel libarc4 i2c_algo_bit snd_intel_dspcfg drm_buddy drm_suballoc_helper uvcvideo snd_hda_codec drm_display_helper edac_mce_amd videobuf2_vmalloc snd_hwdep crct10dif_pclmul videobuf2_memops uvc crc32_pclmul cec snd_hda_core crc32c_intel videobuf2_v4l2 ghash_clmulni_intel lm92 r8169 sha512_ssse3 snd_pcm videodev psmouse thinkpad_acpi iwlwifi drivetemp ledtrig_audio drm_kms_helper rapl videobuf2_common realtek snd_timer serio_raw snd_rn_pci_acp3x wmi_bmof platform_profile cfg80211 mc snd_acp_config k10temp snd syscopyarea mdio_devres ucsi_acpi snd_soc_acpi sysfillrect drm snd_pci_acp3x i2c_piix4 sysimgblt soundcore typec_ucsi ipmi_devintf rfkill roles libphy ipmi_msghandler typec video battery ac wmi i2c_scmi button
> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052
> Jul  4 22:35:22 hho kernel: ---[ end trace 0000000000000000 ]---
> Jul  4 22:35:22 hho kernel: RIP: 0010:wq_worker_comm+0x63/0xc0
> Jul  4 22:35:22 hho kernel: Code: 43 2c 20 75 1d 5b 5d 48 c7 c7 e0 a4 43 82 41 5c 41 5d 41 5e e9 7e 6b 8b 00 5b 5d 41 5c 41 5d 41 5e c3 48 89 df e8 ad 35 00 00 <4c> 8b 70 48 48 89 c3 4d 85 f6 74 cf 4c 89 f7 e8 29 b6 8b 00 80 7b
> Jul  4 22:35:22 hho kernel: RSP: 0018:ffffc90000fb7bb8 EFLAGS: 00010202
> Jul  4 22:35:22 hho kernel: RAX: 000000000000000a RBX: ffff88810cd43300 RCX: 0001020304050608
> Jul  4 22:35:22 hho kernel: RDX: ffff88811395bfc0 RSI: 7fffffffffffffff RDI: ffff88810cd43300
> Jul  4 22:35:22 hho kernel: RBP: 000000000000000f R08: ffffc90000fb7be8 R09: 0000000000000040
> Jul  4 22:35:22 hho kernel: R10: fefefefefefefeff R11: 0000000000000040 R12: ffffc90000fb7be8
> Jul  4 22:35:22 hho kernel: R13: 0000000000000040 R14: 000000000000000c R15: 0000000000000001
> Jul  4 22:35:22 hho kernel: FS:  00007f39dde1c740(0000) GS:ffff8887ef680000(0000) knlGS:0000000000000000
> Jul  4 22:35:22 hho kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> Jul  4 22:35:22 hho kernel: CR2: 0000000000000052 CR3: 0000000112188000 CR4: 0000000000350ee0
> Jul  4 22:35:22 hho kernel: note: start-stop-daem[1740] exited with irqs disabled
> Jul  4 22:35:22 hho kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
> Jul  4 22:35:22 hho kernel: r8169 0000:02:00.0 eth0: Link is Down
> Jul  4 22:35:24 hho kernel: r8169 0000:02:00.0 eth0: Link is Up - 1Gbps/Full - flow control rx/tx
> Jul  4 22:35:24 hho kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> 
> It then kind of limped along until I rebooted again. This second attempt to boot
> died and locked up completely, again during amdgpu initialization, and is on display here:
> https://imgur.com/a/3ZE66kh

refill_obj_stock() is also somewhat unrelated to VMA stuff.  This is
all very bizarre.

> Finally I just edited mm/Kconfig and set config PER_VMA_LOCK to "defbool n" to override
> any setting in my old config. That made everything work again - it's what I'm using now.

Could I ask you to try a few boots with PER_VMA_LOCK set to "n", just
to eliminate the possibility that this is a coincidence?




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux