On 11/7/24 16:46, Alexei Starovoitov wrote: > On Thu, Nov 7, 2024 at 4:38 AM Viktor Malik <vmalik@xxxxxxxxxx> wrote: >> >> Hi, >> >> I'm getting soft lockups when running the BPF arena selftests on powerpc >> (ppcle64). The issue is 100% reproducible on the latest bpf-next with >> `./test_progs -t arena`. >> >> A console snippet for one CPU lockup looks like this: >> >> [ 1124.671746] watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [kworker/u34:0:58] >> [ 1124.675554] CPU#1 Utilization every 4s during lockup: >> [ 1124.675584] #1: 100% system, 0% softirq, 0% hardirq, 0% idle >> [ 1124.675621] #2: 101% system, 0% softirq, 0% hardirq, 0% idle >> [ 1124.675659] #3: 100% system, 0% softirq, 0% hardirq, 0% idle >> [ 1124.675696] #4: 100% system, 0% softirq, 0% hardirq, 0% idle >> [ 1124.675733] #5: 101% system, 0% softirq, 0% hardirq, 0% idle >> [ 1124.675770] Modules linked in: bpf_testmod(OE) bonding tls rfkill virtio_net net_failover vmx_crypto failover virtio_balloon crct10dif_vpmsum fuse loop nfnetlink zram vsock_loopback vmw_vsock_virtio_transport_common vsock virtio_blk crc32c_vpmsum virtio_console >> [ 1124.675921] CPU: 1 UID: 0 PID: 58 Comm: kworker/u34:0 Tainted: G OE 6.12.0-rc4+ #1 >> [ 1124.675975] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE >> [ 1124.676005] Hardware name: IBM pSeries (emulated by qemu) POWER8E (raw) 0x4b0201 of:SLOF,HEAD hv:linux,kvm pSeries >> [ 1124.676063] Workqueue: events_unbound bpf_map_free_deferred >> [ 1124.676101] NIP: c000000000551d3c LR: c000000000551c30 CTR: c0000000004733b0 >> [ 1124.676145] REGS: c000000008a37a20 TRAP: 0900 Tainted: G OE (6.12.0-rc4+) >> [ 1124.676189] MSR: 800000000280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR: 44082828 XER: 00000000 >> [ 1124.676251] CFAR: 0000000000000000 IRQMASK: 0 >> [ 1124.676251] GPR00: c000000000551c30 c000000008a37cc0 c00000000214f800 0000000000000000 >> [ 1124.676251] GPR04: 000000000000003b c00c00000044e3c8 0000000000000000 0000000000000000 >> [ 1124.676251] GPR08: 0000000000000000 0000000000000000 0000000058006001 0000000024082828 >> [ 1124.676251] GPR12: c0000000004733b0 c00000003ffff480 c0000000043cb7c0 c0000000043b1028 >> [ 1124.676251] GPR16: c008000305f78000 0000000000000000 0000000000000001 0000000000000000 >> [ 1124.676251] GPR20: fffffffffffffe7f c008000305f77fff c000000003cbe780 c000000001b26120 >> [ 1124.676251] GPR24: c000000003da0380 ff7fffffffffefbf c000000003cbe780 0000000000000001 >> [ 1124.676251] GPR28: c008000206000000 0000000000000000 c0000000004733b0 c00bf073759e8000 >> [ 1124.676627] NIP [c000000000551d3c] __apply_to_page_range+0x55c/0xea0 >> [ 1124.676667] LR [c000000000551c30] __apply_to_page_range+0x450/0xea0 >> [ 1124.676706] Call Trace: >> [ 1124.676730] [c000000008a37cc0] [c000000000551c30] __apply_to_page_range+0x450/0xea0 (unreliable) >> [ 1124.676784] [c000000008a37de0] [c000000000473360] arena_map_free+0x70/0xc0 > > Thanks for the report. > I have no idea what's wrong with apply_to_page_range on ppc. > Don't have any ppc to test and no debugging experience there. > Unless ppc experts chime in there only option to ignore or disable. > Thanks. Disabling sounds better to me as we can still conveniently run test_progs on ppc. Since some arena tests are quite hard to disable, the easiest approach is to disable arena allocation on unsupported arches. I sent the patch [1]. Viktor [1] https://lore.kernel.org/bpf/20241115082548.74972-1-vmalik@xxxxxxxxxx/T/#u