Hi, I am Jason into Cc. I wonder if the softlockup might be caused by a lack of entropy. On Wed 2025-01-22 10:28:52, kernel test robot wrote: > > > Hello, > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![modprobe:#]" on: > > commit: b63e6f60eab45b16a1bf734fef9035a4c4187cd5 ("serial: 8250: Switch to nbcon console") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > [test failed on linux-next/master 0907e7fb35756464aa34c35d6abb02998418164b] > > in testcase: kunit > version: > with following parameters: > > group: group-01 > > > > config: x86_64-rhel-9.4-kunit > compiler: gcc-12 > test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz (Haswell) with 16G memory > > (please refer to attached dmesg/kmsg for entire log/backtrace) > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202501221029.fb0d574d-lkp@xxxxxxxxx > > > [ 231.759560][ C3] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [modprobe:3860] > [ 231.759572][ C3] Modules linked in: test_rslib(+) reed_solomon ipmi_devintf ipmi_msghandler intel_rapl_msr intel_rapl_common btrfs snd_hda_codec_hdmi x86_pkg_temp_thermal intel_powerclamp blake2b_generic coretemp xor raid6_pq libcrc32c kvm_intel snd_hda_codec_realtek snd_hda_codec_generic platform_profile i915 kvm snd_hda_scodec_component snd_hda_intel sd_mod snd_intel_dspcfg dell_wmi crc32_generic snd_intel_sdw_acpi sg crct10dif_pclmul cec crc32_pclmul dell_smbios snd_hda_codec intel_gtt crc32c_intel dell_wmi_descriptor ghash_clmulni_intel sparse_keymap snd_hda_core ttm snd_hwdep ahci rapl rfkill drm_display_helper snd_pcm mei_wdt libahci intel_cstate dcdbas snd_timer mei_me libata intel_uncore drm_kms_helper snd pcspkr drm_buddy mei soundcore video wmi binfmt_misc drm fuse loop dm_mod ip_tables poly1305_generic chacha_generic [last unloaded: test_fpu] > [ 231.759681][ C3] CPU: 3 UID: 0 PID: 3860 Comm: modprobe Tainted: G S B N 6.13.0-rc3-00034-gb63e6f60eab4 #1 > [ 231.759690][ C3] Tainted: [S]=CPU_OUT_OF_SPEC, [B]=BAD_PAGE, [N]=TEST > [ 231.759694][ C3] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 12/05/2013 > [ 231.759699][ C3] RIP: 0010:encode_rs16 (lib/reed_solomon/encode_rs.c:33) reed_solomon > [ 231.759708][ C3] Code: 87 68 83 00 00 89 da d3 fa 41 0f b6 4d 00 41 38 cc 7c 08 84 c9 0f 85 64 02 00 00 8b 75 04 21 f3 01 d3 39 de 7e c0 48 8b 3c 24 <48> 63 db 48 8d 1c 5f 48 89 d9 48 c1 e9 03 42 0f b6 34 39 48 89 d9 > All code > ======== > 0: 87 68 83 xchg %ebp,-0x7d(%rax) > 3: 00 00 add %al,(%rax) > 5: 89 da mov %ebx,%edx > 7: d3 fa sar %cl,%edx > 9: 41 0f b6 4d 00 movzbl 0x0(%r13),%ecx > e: 41 38 cc cmp %cl,%r12b > 11: 7c 08 jl 0x1b > 13: 84 c9 test %cl,%cl > 15: 0f 85 64 02 00 00 jne 0x27f > 1b: 8b 75 04 mov 0x4(%rbp),%esi > 1e: 21 f3 and %esi,%ebx > 20: 01 d3 add %edx,%ebx > 22: 39 de cmp %ebx,%esi > 24: 7e c0 jle 0xffffffffffffffe6 > 26: 48 8b 3c 24 mov (%rsp),%rdi > 2a:* 48 63 db movslq %ebx,%rbx <-- trapping instruction > 2d: 48 8d 1c 5f lea (%rdi,%rbx,2),%rbx > 31: 48 89 d9 mov %rbx,%rcx > 34: 48 c1 e9 03 shr $0x3,%rcx > 38: 42 0f b6 34 39 movzbl (%rcx,%r15,1),%esi > 3d: 48 89 d9 mov %rbx,%rcx > > Code starting with the faulting instruction > =========================================== > 0: 48 63 db movslq %ebx,%rbx > 3: 48 8d 1c 5f lea (%rdi,%rbx,2),%rbx > 7: 48 89 d9 mov %rbx,%rcx > a: 48 c1 e9 03 shr $0x3,%rcx > e: 42 0f b6 34 39 movzbl (%rcx,%r15,1),%esi > 13: 48 89 d9 mov %rbx,%rcx > [ 231.759717][ C3] RSP: 0018:ffffc90000abf3b0 EFLAGS: 00000297 > [ 231.759723][ C3] RAX: ffff888102c9ff0a RBX: 00000000000000dd RCX: 0000000000000000 > [ 231.759728][ C3] RDX: 0000000000000000 RSI: 00000000000000ff RDI: ffff88816b6b7c00 > [ 231.759733][ C3] RBP: ffff88812901fb00 R08: 00000000000000c8 R09: ffff88816c8b518e > [ 231.759738][ C3] R10: 1ffff11025203f60 R11: ffff88816c8b5184 R12: 0000000000000007 > [ 231.759743][ C3] R13: ffffed1025203f60 R14: ffffed1025203f60 R15: dffffc0000000000 > [ 231.759748][ C3] FS: 00007f64c760f040(0000) GS:ffff8883a7d80000(0000) knlGS:0000000000000000 > [ 231.759754][ C3] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 231.759759][ C3] CR2: 00007f024b693000 CR3: 00000001d4462004 CR4: 00000000001726f0 > [ 231.759764][ C3] DR0: ffffffff8789050c DR1: ffffffff8789050d DR2: ffffffff8789050e > [ 231.759769][ C3] DR3: ffffffff8789050f DR6: 00000000fffe0ff0 DR7: 0000000000000600 > [ 231.759774][ C3] Call Trace: > [ 231.759778][ C3] <IRQ> > [ 231.759782][ C3] ? watchdog_timer_fn (kernel/watchdog.c:770) > [ 231.759790][ C3] ? __pfx_watchdog_timer_fn (kernel/watchdog.c:685) > [ 231.759796][ C3] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1739 kernel/time/hrtimer.c:1803) > [ 231.759803][ C3] ? __pfx___hrtimer_run_queues (kernel/time/hrtimer.c:1773) > [ 231.759808][ C3] ? ktime_get_update_offsets_now (kernel/time/timekeeping.c:312 (discriminator 3) kernel/time/timekeeping.c:335 (discriminator 3) kernel/time/timekeeping.c:2457 (discriminator 3)) > [ 231.759814][ C3] ? sched_clock (arch/x86/include/asm/preempt.h:94 arch/x86/kernel/tsc.c:286) > [ 231.759821][ C3] ? hrtimer_interrupt (kernel/time/hrtimer.c:1868) > [ 231.759828][ C3] ? __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:36 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1056) > [ 231.759835][ C3] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049) > [ 231.759842][ C3] </IRQ> > [ 231.759845][ C3] <TASK> > [ 231.759848][ C3] ? asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702) > [ 231.759857][ C3] ? encode_rs16 (lib/reed_solomon/encode_rs.c:33) reed_solomon > [ 231.759864][ C3] get_rcw_we (lib/reed_solomon/test_rslib.c:173) test_rslib Honestly, I do not see much how this could be related to the serial console. This is a module for testing the Generic Reed Solomon encoder / decoder library. It seems to do a lot of computation and needs a lot of random numbers. I wonder if there is not enough entropy and the test is too slow.