On 26/04/2023 01.30, Alexei Starovoitov wrote:
On Tue, Apr 25, 2023 at 07:15:43PM +0200, Jesper Dangaard Brouer wrote:
Performance is very important for page pool (PP). This add the use of
static_key APIs for regaining a single instruction, which makes the
new PP shutdown scheme zero impact.
We are uncertain if this is 100% correct, because static_key APIs uses
a mutex lock and it is uncertain if all contexts that can return pages
can support this. We could spawn a workqueue (like we just removed) to
workaround this issue.
With debug atomic sleep the issue should be trivial to see.
iirc the callers of xdp_flush_frame_bulk() need to do it under rcu_read_lock equivalent,
which is not sleepable and mutex-es should warn.
Sure, adding CONFIG_DEBUG_ATOMIC_SLEEP and (keeping) CONFIG_DEBUG_MUTEXES.
Testing this with veth changes that added page_pool to veth
commit 0ebab78cbcbf ("net: veth: add page_pool for page recycling").
which makes it easier to create/trigger inflight packets
via softnet_data->defer_list.
It confirms my suspicion, this patch is not correct.
The dmesg warning looks like below.
(The XXX printk is just from my own debugging patch)
The call site net_rx_action+0xe7 is skb_defer_free_flush
$ ./scripts/faddr2line vmlinux net_rx_action+0xe7
net_rx_action+0xe7/0x360:
skb_defer_free_flush at net/core/dev.c:6615
(inlined by) skb_defer_free_flush at net/core/dev.c:6601
(inlined by) net_rx_action at net/core/dev.c:6677
[ 1540.073256] XXX page_pool_shutdown_attempt() inflight=24
[ 1540.078654] XXX page_pool_destroy() Enter into shutdown phase -
inflight=24
[ 1540.094234] XXX page_pool_shutdown_attempt() inflight=23
[ 1540.099626] XXX page_pool_shutdown_attempt() inflight=22
[ 1540.105012] XXX page_pool_shutdown_attempt() inflight=21
[ 1540.110402] XXX page_pool_shutdown_attempt() inflight=20
[ 1540.115793] XXX page_pool_shutdown_attempt() inflight=19
[ 1540.121183] XXX page_pool_shutdown_attempt() inflight=18
[ 1540.126564] XXX page_pool_shutdown_attempt() inflight=17
[ 1540.131945] XXX page_pool_shutdown_attempt() inflight=16
[ 1540.137329] XXX page_pool_shutdown_attempt() inflight=15
[ 1540.142710] XXX page_pool_shutdown_attempt() inflight=14
[ 1540.148110] XXX page_pool_shutdown_attempt() inflight=13
[ 1540.153512] XXX page_pool_shutdown_attempt() inflight=12
[ 1540.312284] XXX page_pool_shutdown_attempt() inflight=11
[ 1540.317712] XXX page_pool_shutdown_attempt() inflight=10
[ 1540.323128] XXX page_pool_shutdown_attempt() inflight=9
[ 1540.328459] XXX page_pool_shutdown_attempt() inflight=8
[ 1540.333788] XXX page_pool_shutdown_attempt() inflight=7
[ 1540.339111] XXX page_pool_shutdown_attempt() inflight=6
[ 1540.344439] XXX page_pool_shutdown_attempt() inflight=5
[ 1540.349768] XXX page_pool_shutdown_attempt() inflight=4
[ 1540.355091] XXX page_pool_shutdown_attempt() inflight=3
[ 1540.360420] XXX page_pool_shutdown_attempt() inflight=2
[ 1540.365741] XXX page_pool_shutdown_attempt() inflight=1
[ 1540.371064] BUG: sleeping function called from invalid context at
include/linux/percpu-rwsem.h:49
[ 1540.380052] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid:
26501, name: netperf
[ 1540.388171] preempt_count: 101, expected: 0
[ 1540.392452] RCU nest depth: 2, expected: 0
[ 1540.396640] CPU: 2 PID: 26501 Comm: netperf Not tainted
6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.406147] Hardware name: Supermicro Super Server/X10SRi-F, BIOS
2.0a 08/01/2016
[ 1540.413749] Call Trace:
[ 1540.416289] <IRQ>
[ 1540.418395] dump_stack_lvl+0x32/0x50
[ 1540.422160] __might_resched+0x11c/0x160
[ 1540.426182] cpus_read_lock+0x16/0x60
[ 1540.429942] static_key_slow_dec+0x17/0x50
[ 1540.434137] page_pool_shutdown_attempt+0x50/0x60
[ 1540.438940] page_pool_return_skb_page+0x68/0xe0
[ 1540.443654] skb_free_head+0x4f/0x90
[ 1540.447326] skb_release_data+0x142/0x1a0
[ 1540.451435] napi_consume_skb+0x6b/0x180
[ 1540.455448] net_rx_action+0xe7/0x360
[ 1540.459209] __do_softirq+0xcb/0x2b1
[ 1540.462885] do_softirq+0x63/0x90
[ 1540.466297] </IRQ>
[ 1540.468487] <TASK>
[ 1540.470671] __local_bh_enable_ip+0x64/0x70
[ 1540.474948] __dev_queue_xmit+0x257/0x8a0
[ 1540.479049] ? __ip_local_out+0x48/0x160
[ 1540.483070] ip_finish_output2+0x25c/0x540
[ 1540.487255] __ip_queue_xmit+0x171/0x420
[ 1540.491266] ? ___slab_alloc+0xd1/0x670
[ 1540.495202] __tcp_transmit_skb+0x83c/0x950
[ 1540.499486] tcp_write_xmit+0x373/0xa60
[ 1540.503419] __tcp_push_pending_frames+0x32/0xf0
[ 1540.508125] tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.512408] tcp_sendmsg+0x27/0x40
[ 1540.515907] sock_sendmsg+0x8b/0xa0
[ 1540.519494] ? sockfd_lookup_light+0x12/0x70
[ 1540.523856] __sys_sendto+0xeb/0x130
[ 1540.527531] ? __switch_to_asm+0x3a/0x80
[ 1540.531549] ? rseq_get_rseq_cs+0x32/0x290
[ 1540.535736] ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.539751] __x64_sys_sendto+0x20/0x30
[ 1540.543686] do_syscall_64+0x3a/0x90
[ 1540.547357] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.552502] RIP: 0033:0x7f389f713680
[ 1540.556174] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.575075] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
[ 1540.582755] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX:
00007f389f713680
[ 1540.589983] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI:
0000000000000004
[ 1540.597208] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09:
0000000000000000
[ 1540.604438] R10: 0000000000000000 R11: 0000000000000246 R12:
000000000063fcf8
[ 1540.611664] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15:
00007f389fa87000
[ 1540.618896] </TASK>
[ 1540.621199] ------------[ cut here ]------------
[ 1540.625911] WARNING: CPU: 2 PID: 26501 at kernel/smp.c:912
smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.635001] Modules linked in: veth nf_defrag_ipv6 nf_defrag_ipv4
ib_umad rdma_ucm ib_ipoib rdma_cm iw_cm irdma ib_cm ice
intel_uncore_frequency intel_uncore_frequency_common coretemp kvm_intel
mlx5_ib kvm ib_uverbs irqbypass rapl intel_cstate ib_core i2c_i801
intel_uncore pcspkr i2c_smbus bfq acpi_ipmi wmi ipmi_si ipmi_devintf
ipmi_msghandler acpi_pad sch_fq_codel drm fuse zram zsmalloc mlx5_core
ixgbe i40e igb igc sd_mod t10_pi mlxfw psample hid_generic
crc64_rocksoft_generic i2c_algo_bit crc64_rocksoft i2c_core mdio ptp
pps_core [last unloaded: bpfilter]
[ 1540.684653] CPU: 2 PID: 26501 Comm: netperf Tainted: G W
6.3.0-rc7-net-next-pp-shutdown-vm-lock-dbg+ #67
[ 1540.695642] Hardware name: Supermicro Super Server/X10SRi-F, BIOS
2.0a 08/01/2016
[ 1540.703239] RIP: 0010:smp_call_function_many_cond+0x2c1/0x2e0
[ 1540.709090] Code: fe ff ff 8b 44 24 14 48 c7 c6 a0 04 fc 82 8b 15 85
f4 e6 01 48 8b 3c 24 8d 48 01 48 63 c9 e8 36 c5 4b 00 89 c2 e9 c5 fd ff
ff <0f> 0b e9 86 fd ff ff 8b 7c 24 28 e8 4f b8 f7 ff eb af 66 66 2e 0f
[ 1540.727999] RSP: 0018:ffffc9000003cda8 EFLAGS: 00010206
[ 1540.733322] RAX: 0000000000000102 RBX: ffffffff8383b2ea RCX:
0000000000000003
[ 1540.740552] RDX: 0000000000000000 RSI: ffffffff810271a0 RDI:
ffffffff82fc04a0
[ 1540.747782] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000001
[ 1540.755008] R10: 000000087c9e9000 R11: ffffffff82c4f620 R12:
0000000000000000
[ 1540.762241] R13: ffffffff8383b300 R14: 0000000000000002 R15:
0000000000000001
[ 1540.769474] FS: 00007f389f954740(0000) GS:ffff88887fc80000(0000)
knlGS:0000000000000000
[ 1540.777685] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1540.783528] CR2: 00007f1e27582000 CR3: 000000015775c004 CR4:
00000000003706e0
[ 1540.790761] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 1540.797987] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 1540.805220] Call Trace:
[ 1540.807763] <IRQ>
[ 1540.809868] ? __pfx_do_sync_core+0x10/0x10
[ 1540.814146] on_each_cpu_cond_mask+0x20/0x40
[ 1540.818514] text_poke_bp_batch+0x9e/0x250
[ 1540.822708] text_poke_finish+0x1b/0x30
[ 1540.826644] arch_jump_label_transform_apply+0x16/0x30
[ 1540.831880] __static_key_slow_dec_cpuslocked.part.0+0x2f/0x40
[ 1540.837809] static_key_slow_dec+0x30/0x50
[ 1540.841999] page_pool_shutdown_attempt+0x50/0x60
[ 1540.846803] page_pool_return_skb_page+0x68/0xe0
[ 1540.851517] skb_free_head+0x4f/0x90
[ 1540.855194] skb_release_data+0x142/0x1a0
[ 1540.859300] napi_consume_skb+0x6b/0x180
[ 1540.863320] net_rx_action+0xe7/0x360
[ 1540.867082] __do_softirq+0xcb/0x2b1
[ 1540.870756] do_softirq+0x63/0x90
[ 1540.874171] </IRQ>
[ 1540.876359] <TASK>
[ 1540.878553] __local_bh_enable_ip+0x64/0x70
[ 1540.882829] __dev_queue_xmit+0x257/0x8a0
[ 1540.886938] ? __ip_local_out+0x48/0x160
[ 1540.890959] ip_finish_output2+0x25c/0x540
[ 1540.895155] __ip_queue_xmit+0x171/0x420
[ 1540.899175] ? ___slab_alloc+0xd1/0x670
[ 1540.903108] __tcp_transmit_skb+0x83c/0x950
[ 1540.907393] tcp_write_xmit+0x373/0xa60
[ 1540.911327] __tcp_push_pending_frames+0x32/0xf0
[ 1540.916041] tcp_sendmsg_locked+0x3a0/0xa10
[ 1540.920324] tcp_sendmsg+0x27/0x40
[ 1540.923821] sock_sendmsg+0x8b/0xa0
[ 1540.927411] ? sockfd_lookup_light+0x12/0x70
[ 1540.931780] __sys_sendto+0xeb/0x130
[ 1540.935456] ? __switch_to_asm+0x3a/0x80
[ 1540.939477] ? rseq_get_rseq_cs+0x32/0x290
[ 1540.943670] ? rseq_ip_fixup+0x14f/0x1e0
[ 1540.947693] __x64_sys_sendto+0x20/0x30
[ 1540.951626] do_syscall_64+0x3a/0x90
[ 1540.955300] entry_SYSCALL_64_after_hwframe+0x72/0xdc
[ 1540.960451] RIP: 0033:0x7f389f713680
[ 1540.964124] Code: 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 41 89 ca
64 8b 04 25 18 00 00 00 85 c0 75 1d 45 31 c9 45 31 c0 b8 2c 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 68 c3 0f 1f 80 00 00 00 00 55 48 83 ec 20 48
[ 1540.983035] RSP: 002b:00007ffcb4d4a538 EFLAGS: 00000246 ORIG_RAX:
000000000000002c
[ 1540.990724] RAX: ffffffffffffffda RBX: 000000000063fd28 RCX:
00007f389f713680
[ 1540.997957] RDX: 0000000000004000 RSI: 0000000001ac3c40 RDI:
0000000000000004
[ 1541.005187] RBP: 00007ffcb4d4a570 R08: 0000000000000000 R09:
0000000000000000
[ 1541.012422] R10: 0000000000000000 R11: 0000000000000246 R12:
000000000063fcf8
[ 1541.019649] R13: 00007ffcb4d4a910 R14: 0000000000000000 R15:
00007f389fa87000
[ 1541.026881] </TASK>
[ 1541.029166] ---[ end trace 0000000000000000 ]---
[ 1541.033925] XXX page_pool_shutdown_attempt() inflight=0