Re: linux 5.12 - fails to boot - soft lockup - CPU#0 stuck for 23s! - RIP smp_call_function_single

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/17/21 6:27 AM, Christoph Hellwig wrote:
> Any information of the system?  What block driver(s) do you use, how
> many CPUs, kernel config?
> 

Hey Chris

I see that Markus followed-up with:

====
Well, turns out I should've googled (or at least looked at the bcache wiki entry) at first, which points to a known bug involving bcache and 5.12: https://www.spinics.net/lists/linux-bcache/msg10077.html

I still find it interesting that I get the same symptoms that James describes, but other than that the issues don't seem to be related.
====

For my part, I also had to re-run my bisect, with more thorough testing.  The result changed, and we are currently investigating the final commit, at 4f432e8bb15b x86/mce: Get rid of mcheck_intel_therm_init().

So now, I expect that my issue has nothing to do with your patch set.  Sorry about the noise.  If you still have an interest in my issue, there are posts going to linux-smp and lkml.

James

> On Fri, May 14, 2021 at 12:39:59PM -0600, James Feeney wrote:
>> With the patch to kernel/smp.c in linux 5.12.4, "smp: Fix smp_call_function_single_async prototype", by Arnd Bergmann, I thought maybe there was a fix.  But no.  The error is the same, except the top of the Call Trace is different:
>>
>> ...
>> watchdog: BUG: soft lockup - CPU#0 stuck for 23s! ...
>> ...
>> RIP: 0010:smp_call_function_single+0xeb/0x130
>> ...
>> Call Trace:
>> ? text_poke_loc_init+0x160/0x160
>> ? text_poke_loc_init+0x160/0x160
>> on_each_cpu+0x39/0x90
>> ...
>>
>> and repeats indefinitely.
>>
>> Again, smp_call_function_single is defined in kernel/smp.c
>>
>> It seems that my git bisect is probably off, since apparently the system may sometimes boot to a temporarily working state, and some "exercise" is needed to identify the failure.  However, see another git bisect for possibly the same issue at
>>
>>  https://bugs.archlinux.org/task/70663#comment199765
>>
>> with "bisect-result.txt"
>>
>>  https://bugs.archlinux.org/task/70663?getfile=20255
>>
>> Markus says, in part:
>>
>> ====
>> Trying to bisect, I arrived at a different set of commits though.
>> 7a800a20ae6329e803c5c646b20811a6ae9ca136 showed the issue described, where a seemingly working kernel will lock up rather quickly.
>> f007a3d66c5480c8dae3fa20a89a06861ef1f5db worked flawlessly, without any hiccups doing random internet browsing while I was compiling the next bisect step.
>> However, there are six commits between those, that did not boot and left me stuck with a black screen right after the bootloader (so no systemd startup message or similar). The system did not react to any inputs (Alt+SysRq) or to a short press of the PC's power button, and thus a hard shutdown was necessary.
>> ====
>>
>> These 8 commits - total - are from Christopher Hellwig, 2021 Feb 02.  Perhaps something closer to the real issue is in there.  As with Markus, I've also noticed that a "warm" reboot can result in a frozen system immediately after the boot loader has run.  A full power-off reboot is needed to get past the early screen initialization.
>>
>> I'll have to re-do my git bisect, with more extensive system "exercise", to see if something more useful results.
>>
>> James
> ---end quoted text---
> 



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux