On Fri, Jan 20, 2023 at 07:02:48PM +0200, Leon Romanovsky wrote: > From: Dragos Tatulea <dtatulea@xxxxxxxxxx> > > The cited commit creates child PKEY interfaces over netlink will multiple > tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx > queues. This causes to a crash when traffic is sent over the PKEY interface > due to the parent having a single queue but the child having multiple queues. > > This patch inherits the real_num_tx/rx_queues from the parent netdev. > > BUG: kernel NULL pointer dereference, address: 000000000000036b > PGD 0 P4D 0 > Oops: 0000 [#1] SMP > CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > RIP: 0010:kmem_cache_alloc+0xcb/0x450 > Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b > RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 > RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae > RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 > RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 > R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 > R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 > FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > Call Trace: > <TASK> > skb_clone+0x55/0xd0 > ip6_finish_output2+0x3fe/0x690 > ip6_finish_output+0xfa/0x310 > ip6_send_skb+0x1e/0x60 > udp_v6_send_skb+0x1e5/0x420 > udpv6_sendmsg+0xb3c/0xe60 > ? ip_mc_finish_output+0x180/0x180 > ? __switch_to_asm+0x3a/0x60 > ? __switch_to_asm+0x34/0x60 > sock_sendmsg+0x33/0x40 > __sys_sendto+0x103/0x160 > ? _copy_to_user+0x21/0x30 > ? kvm_clock_get_cycles+0xd/0x10 > ? ktime_get_ts64+0x49/0xe0 > __x64_sys_sendto+0x25/0x30 > do_syscall_64+0x3d/0x90 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x7f9374f1ed14 > Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b > RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c > RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 > RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 > RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 > R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc > </TASK> > > Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") > Signed-off-by: Dragos Tatulea <dtatulea@xxxxxxxxxx> > Reviewed-by: Tariq Toukan <tariqt@xxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > --- > Changelog: > v1: > * Fixed typo in warning print. > v0: https://lore.kernel.org/all/4a7ecec08ee30ad8004019818fadf1e58057e945.1674137153.git.leon@xxxxxxxxxx > --- > drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c > index 9ad8d9856275..0548735a15b5 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c > @@ -126,6 +126,18 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, > } else > child_pkey = nla_get_u16(data[IFLA_IPOIB_PKEY]); > > + err = netif_set_real_num_tx_queues(dev, pdev->real_num_tx_queues); > + if (err) { > + ipoib_warn(ppriv, "failed setting the child tx queue count based on parent\n"); > + return err; > + } > + > + err = netif_set_real_num_rx_queues(dev, pdev->real_num_rx_queues); > + if (err) { > + ipoib_warn(ppriv, "failed setting the child rx queue count based on parent\n"); > + return err; > + } This still seems flawed.. Netlink does this: unsigned int num_rx_queues = 1; if (tb[IFLA_NUM_RX_QUEUES]) num_rx_queues = nla_get_u32(tb[IFLA_NUM_RX_QUEUES]); else if (ops->get_num_rx_queues) num_rx_queues = ops->get_num_rx_queues(); So num_rx_queues can really be any value that userspaces cares to provide. If pdev->real_num_rx_queues is > the user provided value then netif_set_real_num_rx_queues() just fails. So at a minimum this should min the actual number of queues requested against the maximum number of queues the driver can provide and use that to set the real queues. And the return of a really big number from ops->get_num_rx_queues is pretty ugly too, ideally that would be fixed to pass in some function arguments and obtain the ppriv so it can return the actual maximum number of queues and we don't waste a bunch of memory.. Jason