Re: [BUG] Soft lockup on powerpc when running arena selftests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/7/24 16:46, Alexei Starovoitov wrote:
> On Thu, Nov 7, 2024 at 4:38 AM Viktor Malik <vmalik@xxxxxxxxxx> wrote:
>>
>> Hi,
>>
>> I'm getting soft lockups when running the BPF arena selftests on powerpc
>> (ppcle64). The issue is 100% reproducible on the latest bpf-next with
>> `./test_progs -t arena`.
>>
>> A console snippet for one CPU lockup looks like this:
>>
>> [ 1124.671746] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/u34:0:58]
>> [ 1124.675554] CPU#1 Utilization every 4s during lockup:
>> [ 1124.675584]  #1: 100% system,          0% softirq,     0% hardirq,     0% idle
>> [ 1124.675621]  #2: 101% system,          0% softirq,     0% hardirq,     0% idle
>> [ 1124.675659]  #3: 100% system,          0% softirq,     0% hardirq,     0% idle
>> [ 1124.675696]  #4: 100% system,          0% softirq,     0% hardirq,     0% idle
>> [ 1124.675733]  #5: 101% system,          0% softirq,     0% hardirq,     0% idle
>> [ 1124.675770] Modules linked in: bpf_testmod(OE) bonding tls rfkill virtio_net net_failover vmx_crypto failover virtio_balloon crct10dif_vpmsum fuse loop nfnetlink zram vsock_loopback vmw_vsock_virtio_transport_common vsock virtio_blk crc32c_vpmsum virtio_console
>> [ 1124.675921] CPU: 1 UID: 0 PID: 58 Comm: kworker/u34:0 Tainted: G           OE      6.12.0-rc4+ #1
>> [ 1124.675975] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>> [ 1124.676005] Hardware name: IBM pSeries (emulated by qemu) POWER8E (raw) 0x4b0201 of:SLOF,HEAD hv:linux,kvm pSeries
>> [ 1124.676063] Workqueue: events_unbound bpf_map_free_deferred
>> [ 1124.676101] NIP:  c000000000551d3c LR: c000000000551c30 CTR: c0000000004733b0
>> [ 1124.676145] REGS: c000000008a37a20 TRAP: 0900   Tainted: G           OE       (6.12.0-rc4+)
>> [ 1124.676189] MSR:  800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 44082828  XER: 00000000
>> [ 1124.676251] CFAR: 0000000000000000 IRQMASK: 0
>> [ 1124.676251] GPR00: c000000000551c30 c000000008a37cc0 c00000000214f800 0000000000000000
>> [ 1124.676251] GPR04: 000000000000003b c00c00000044e3c8 0000000000000000 0000000000000000
>> [ 1124.676251] GPR08: 0000000000000000 0000000000000000 0000000058006001 0000000024082828
>> [ 1124.676251] GPR12: c0000000004733b0 c00000003ffff480 c0000000043cb7c0 c0000000043b1028
>> [ 1124.676251] GPR16: c008000305f78000 0000000000000000 0000000000000001 0000000000000000
>> [ 1124.676251] GPR20: fffffffffffffe7f c008000305f77fff c000000003cbe780 c000000001b26120
>> [ 1124.676251] GPR24: c000000003da0380 ff7fffffffffefbf c000000003cbe780 0000000000000001
>> [ 1124.676251] GPR28: c008000206000000 0000000000000000 c0000000004733b0 c00bf073759e8000
>> [ 1124.676627] NIP [c000000000551d3c] __apply_to_page_range+0x55c/0xea0
>> [ 1124.676667] LR [c000000000551c30] __apply_to_page_range+0x450/0xea0
>> [ 1124.676706] Call Trace:
>> [ 1124.676730] [c000000008a37cc0] [c000000000551c30] __apply_to_page_range+0x450/0xea0 (unreliable)
>> [ 1124.676784] [c000000008a37de0] [c000000000473360] arena_map_free+0x70/0xc0
> 
> Thanks for the report.
> I have no idea what's wrong with apply_to_page_range on ppc.
> Don't have any ppc to test and no debugging experience there.
> Unless ppc experts chime in there only option to ignore or disable.
> 

Thanks.

Disabling sounds better to me as we can still conveniently run
test_progs on ppc. Since some arena tests are quite hard to disable, the
easiest approach is to disable arena allocation on unsupported arches.

I sent the patch [1].

Viktor

[1]
https://lore.kernel.org/bpf/20241115082548.74972-1-vmalik@xxxxxxxxxx/T/#u





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux