Re: [PATCH v5] parisc: Fix spinlock barriers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Rolf,

On 31.08.20 18:21, Rolf Eike Beer wrote:
> These things are in 5.8.4 AFAICT, and the lockups are still there:

Thanks for testing!
 
> [320616.602705] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [hppa2.0-unknown:29093]
> [320616.602705] Modules linked in: 8021q ipmi_poweroff ipmi_si ipmi_devintf sata_via ipmi_msghandler cbc dm_zero dm_snapshot dm_mirror dm_region_hash dm_log dm_crypt dm_bufio pata_sil680 libata
> [320616.602705] CPU: 0 PID: 29093 Comm: hppa2.0-unknown Not tainted 5.8.4-gentoo-parisc64 #1
> [320616.602705] Hardware name: 9000/785/C8000
>...
> [320616.602705] IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000402706d0 00000000402706d4
> [320616.602705]  IIR: 0ff0109c    ISR: 000000005836f8a0  IOR: 0000000000000001
> [320616.602705]  CPU:        0   CR30: 0000004083878000 CR31: ffffffffffffffff
> [320616.602705]  ORIG_R28: 0000000000000801
> [320616.602705]  IAOQ[0]: smp_call_function_many_cond+0x490/0x500
> [320616.602705]  IAOQ[1]: smp_call_function_many_cond+0x494/0x500
> [320616.602705]  RP(r2): smp_call_function_many_cond+0x468/0x500
> [320616.602705] Backtrace:
> [320616.602705]  [<0000000040270824>] on_each_cpu+0x5c/0x98
> [320616.602705]  [<0000000040186a0c>] flush_tlb_all+0x204/0x228
> [320616.602705]  [<00000000402ef1f8>] tlb_finish_mmu+0x1d8/0x210
> [320616.602705]  [<00000000402eb820>] exit_mmap+0x1d8/0x370
> [320616.602705]  [<00000000401b5ec0>] mmput+0xe8/0x260
> [320616.602705]  [<00000000401c1690>] do_exit+0x558/0x12e8
> [320616.602705]  [<00000000401c3f18>] do_group_exit+0x50/0x118
> [320616.602705]  [<00000000401c4000>] sys_exit_group+0x20/0x28
> [320616.602705]  [<0000000040192018>] syscall_exit+0x0/0x14

I agree. I have seen the same stall too.
I think we should try to analyze how the stall in smp_call_function_many_cond()
can happen. The trace seems always to point to do_exit().

I think those patches from Linus helped for the "old kind of stalls" which we have had in the last months/years:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c6fe44d96fc1536af5b11cd859686453d1b7bfd1 and 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a9127fcf2296674d58024f83981f40b128fffea
Those old stalls were something like this and didn't pointed to do_exit():

[111395.307021] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[111395.311001] rcu:    3-...0: (1 GPs behind) idle=04e/1/0x4000000000000000 softirq=13650053/13650054 fqs=2625
[111395.311001]         (detected by 0, t=5252 jiffies, g=25258025, q=1240)
[111395.311001] Task dump for CPU 3:
[111395.311001] init            R  running task        0     1      0 0x00000016
[111395.311001] Backtrace:
[111395.311001]  [<0000000040416110>] hrtimer_try_to_cancel+0x13c/0x1f8
[111395.311001]  [<0000000040e65a18>] schedule_hrtimeout_range_clock+0x10c/0x1b8
[111395.311001]  [<0000000040e65af4>] schedule_hrtimeout_range+0x30/0x60
[111395.311001]  [<0000000040e5ebbc>] _cond_resched+0x40/0xb8
[111395.311001]  [<00000000403617a4>] get_signal+0x348/0xf00
[111395.311001]  [<000000004031cec0>] do_signal+0x54/0x230
[111395.311001]  [<000000004031d0f8>] do_notify_resume+0x5c/0x164
[111395.311001]  [<0000000040310110>] syscall_do_signal+0x54/0xa0
[111395.311001]  [<000000004030f074>] intr_return+0x0/0xc
[111395.311001]
[111458.330562] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[111458.330562] rcu:    3-...0: (1 GPs behind) idle=04e/1/0x4000000000000000 softirq=13650053/13650054 fqs=4653
[111458.330562]         (detected by 0, t=21007 jiffies, g=25258025, q=1361)
[111458.330562] Task dump for CPU 3:
[111458.330562] init            R  running task        0     1      0 0x00000016
...

Helge

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux SoC]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux