Re: [linux-next:master] [serial] b63e6f60ea: BUG:soft_lockup-CPU##stuck_for#s![modprobe:#]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi John!

On Mon, 17 Mar 2025 09:51:46 +0106, John Ogness wrote:
>On 2025-03-15, Ryo Takakura <ryotkkr98@xxxxxxxxx> wrote:
>> I got the same softlockup during the test regardless of the presence
>> of the commits.
>>
>> [   60.222013] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/0:1]
>> [   60.222023] Modules linked in:
>> [   60.222032] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Tainted: G             L     6.14.0-rc6-v14-rc6-voluntary+ #4
>> [   60.222047] Tainted: [L]=SOFTLOCKUP
>> [   60.222051] Hardware name: Raspberry Pi 4 Model B Rev 1.5 (DT)
>> [   60.222055] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>> [   60.222066] pc : get_random_u32+0xac/0x118
>> [   60.222081] lr : __get_random_u32_below+0x20/0x78
>> [   60.222094] sp : ffffffc08002bb80
>> [   60.222098] x29: ffffffc08002bb80 x28: 0000000000000003 x27: 0000000000000001
>> [   60.222114] x26: ffffff804112e6a4 x25: ffffffd33ed21820 x24: ffffff804112e69c
>> [   60.222128] x23: 0000000000000000 x22: ffffff804112e64e x21: 0000000000000000
>> [   60.222142] x20: 000000000000000d x19: ffffff80fb7aebb8 x18: 0000000000000002
>> [   60.222156] x17: 0000000000000004 x16: ffffff804112e584 x15: ffffff8041126796
>> [   60.222169] x14: ffffff80411267c0 x13: 0000000000000006 x12: ffffff804112e5c0
>> [   60.222183] x11: ffffff804112e64c x10: 0000000000000007 x9 : ffffffd33dccdd10
>> [   60.222196] x8 : ffffff804112e6a8 x7 : 0000000000000000 x6 : 0005000400060005
>> [   60.222210] x5 : ffffff804112e65a x4 : 0000000000000000 x3 : 0000000000000010
>> [   60.222223] x2 : 0000000000000014 x1 : 000000002c7d0b7a x0 : 0000000000000013
>> [   60.222237] Call trace:
>> [   60.222241]  get_random_u32+0xac/0x118 (P)
>> [   60.222256]  __get_random_u32_below+0x20/0x78
>> [   60.222268]  get_rcw_we+0x180/0x208
>> [   60.222278]  test_rslib_init+0x2c8/0xba0
>> [   60.222292]  do_one_initcall+0x4c/0x210
>> [   60.222303]  kernel_init_freeable+0x1fc/0x3a0
>> [   60.222317]  kernel_init+0x28/0x1f8
>> [   60.222327]  ret_from_fork+0x10/0x20
>>
>>> I wonder if a cond_resched() in some loop would help. Or using a
>>
>> I wasn't sure which loop would be the appropriate one but adding
>> cond_resched() as below worked as suggested.
>>
>> ----- BEGIN -----
>> diff --git a/lib/reed_solomon/test_rslib.c b/lib/reed_solomon/test_rslib.c
>> index 75cb1adac..322d7b0a8 100644
>> --- a/lib/reed_solomon/test_rslib.c
>> +++ b/lib/reed_solomon/test_rslib.c
>> @@ -306,6 +306,8 @@ static void test_uc(struct rs_control *rs, int len, int errs,
>>
>>                 if (memcmp(r, c, len * sizeof(*r)))
>>                         stat->dwrong++;
>> +
>> +               cond_resched();
>>         }
>>         stat->nwords += trials;
>>  }
>> @@ -400,6 +402,8 @@ static void test_bc(struct rs_control *rs, int len, int errs,
>>                 } else {
>>                         stat->rfail++;
>>                 }
>> +
>> +               cond_resched();
>>         }
>>         stat->nwords += trials;
>>  }
>> ----- END -----
>>
>>> pseudorandom generator. I remember the problems related to much
>>> slower random generator, for example, see the commit
>>> f900fde28883602b6 ("crypto: testmgr - fix RNG performance in fuzz
>>> tests").
>>
>> I haven't tested this but I'll look into it!
>>
>>> That said, I did not dig deep into the code. And I did not try to
>>> reproduce the softlockup. I am pretty busy at the moment with some
>>> other stuff. I just wanted to give it a look and share my opinion.
>>
>> I think the softlockup is rather a problem of test itself, 
>> not the two commits.
>
>Thanks Ryo for looking into this! I think we need to have a technical
>explanation/understanding of the problem so that it is clear how my
>series triggers or exaggerates the issue.

I see. I can't yet say anything but still not sure if your series has
anything to do with the softlockup... I found that there was similar 
report in the past [0].

Anyways, I will look into the rslib test itself more carefully to see 
if I can give the needed technical explanation for the problem!

Sincerely,
Ryo Takakura

[0] https://lore.kernel.org/linux-kernel//a309f09e07374e62a77bb84c70e6715efd288448.1583122776.git.planteen@xxxxxxxxx/

>John




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux