On Fri, Jan 20, 2023 at 04:52:21PM -0400, Jason Gunthorpe wrote: > On Fri, Jan 20, 2023 at 07:02:48PM +0200, Leon Romanovsky wrote: > > From: Dragos Tatulea <dtatulea@xxxxxxxxxx> > > > > The cited commit creates child PKEY interfaces over netlink will multiple > > tx and rx queues, but some devices doesn't support more than 1 tx and 1 rx > > queues. This causes to a crash when traffic is sent over the PKEY interface > > due to the parent having a single queue but the child having multiple queues. > > > > This patch inherits the real_num_tx/rx_queues from the parent netdev. > > > > BUG: kernel NULL pointer dereference, address: 000000000000036b > > PGD 0 P4D 0 > > Oops: 0000 [#1] SMP > > CPU: 4 PID: 209665 Comm: python3 Not tainted 6.1.0_for_upstream_min_debug_2022_12_12_17_02 #1 > > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 > > RIP: 0010:kmem_cache_alloc+0xcb/0x450 > > Code: ce 7e 49 8b 50 08 49 83 78 10 00 4d 8b 28 0f 84 cb 02 00 00 4d 85 ed 0f 84 c2 02 00 00 41 8b 44 24 28 48 8d 4a 01 49 8b 3c 24 <49> 8b 5c 05 00 4c 89 e8 65 48 0f c7 0f 0f 94 c0 84 c0 74 b8 41 8b > > RSP: 0018:ffff88822acbbab8 EFLAGS: 00010202 > > RAX: 0000000000000070 RBX: ffff8881c28e3e00 RCX: 00000000064f8dae > > RDX: 00000000064f8dad RSI: 0000000000000a20 RDI: 0000000000030d00 > > RBP: 0000000000000a20 R08: ffff8882f5d30d00 R09: ffff888104032f40 > > R10: ffff88810fade828 R11: 736f6d6570736575 R12: ffff88810081c000 > > R13: 00000000000002fb R14: ffffffff817fc865 R15: 0000000000000000 > > FS: 00007f9324ff9700(0000) GS:ffff8882f5d00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 000000000000036b CR3: 00000001125af004 CR4: 0000000000370ea0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > Call Trace: > > <TASK> > > skb_clone+0x55/0xd0 > > ip6_finish_output2+0x3fe/0x690 > > ip6_finish_output+0xfa/0x310 > > ip6_send_skb+0x1e/0x60 > > udp_v6_send_skb+0x1e5/0x420 > > udpv6_sendmsg+0xb3c/0xe60 > > ? ip_mc_finish_output+0x180/0x180 > > ? __switch_to_asm+0x3a/0x60 > > ? __switch_to_asm+0x34/0x60 > > sock_sendmsg+0x33/0x40 > > __sys_sendto+0x103/0x160 > > ? _copy_to_user+0x21/0x30 > > ? kvm_clock_get_cycles+0xd/0x10 > > ? ktime_get_ts64+0x49/0xe0 > > __x64_sys_sendto+0x25/0x30 > > do_syscall_64+0x3d/0x90 > > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > RIP: 0033:0x7f9374f1ed14 > > Code: 42 41 f8 ff 44 8b 4c 24 2c 4c 8b 44 24 20 89 c5 44 8b 54 24 28 48 8b 54 24 18 b8 2c 00 00 00 48 8b 74 24 10 8b 7c 24 08 0f 05 <48> 3d 00 f0 ff ff 77 34 89 ef 48 89 44 24 08 e8 68 41 f8 ff 48 8b > > RSP: 002b:00007f9324ff7bd0 EFLAGS: 00000293 ORIG_RAX: 000000000000002c > > RAX: ffffffffffffffda RBX: 00007f9324ff7cc8 RCX: 00007f9374f1ed14 > > RDX: 00000000000002fb RSI: 00007f93000052f0 RDI: 0000000000000030 > > RBP: 0000000000000000 R08: 00007f9324ff7d40 R09: 000000000000001c > > R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000 > > R13: 000000012a05f200 R14: 0000000000000001 R15: 00007f9374d57bdc > > </TASK> > > > > Fixes: dbc94a0fb817 ("IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces") > > Signed-off-by: Dragos Tatulea <dtatulea@xxxxxxxxxx> > > Reviewed-by: Tariq Toukan <tariqt@xxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> > > --- > > Changelog: > > v1: > > * Fixed typo in warning print. > > v0: https://lore.kernel.org/all/4a7ecec08ee30ad8004019818fadf1e58057e945.1674137153.git.leon@xxxxxxxxxx > > --- > > drivers/infiniband/ulp/ipoib/ipoib_netlink.c | 12 ++++++++++++ > > 1 file changed, 12 insertions(+) > > > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c > > index 9ad8d9856275..0548735a15b5 100644 > > --- a/drivers/infiniband/ulp/ipoib/ipoib_netlink.c > > +++ b/drivers/infiniband/ulp/ipoib/ipoib_netlink.c > > @@ -126,6 +126,18 @@ static int ipoib_new_child_link(struct net *src_net, struct net_device *dev, > > } else > > child_pkey = nla_get_u16(data[IFLA_IPOIB_PKEY]); > > > > + err = netif_set_real_num_tx_queues(dev, pdev->real_num_tx_queues); > > + if (err) { > > + ipoib_warn(ppriv, "failed setting the child tx queue count based on parent\n"); > > + return err; > > + } > > + > > + err = netif_set_real_num_rx_queues(dev, pdev->real_num_rx_queues); > > + if (err) { > > + ipoib_warn(ppriv, "failed setting the child rx queue count based on parent\n"); > > + return err; > > + } > > This still seems flawed.. Netlink does this: > > unsigned int num_rx_queues = 1; > > if (tb[IFLA_NUM_RX_QUEUES]) > num_rx_queues = nla_get_u32(tb[IFLA_NUM_RX_QUEUES]); > else if (ops->get_num_rx_queues) > num_rx_queues = ops->get_num_rx_queues(); > > So num_rx_queues can really be any value that userspaces cares to > provide. > > If pdev->real_num_rx_queues is > the user provided value then > netif_set_real_num_rx_queues() just fails. > > So at a minimum this should min the actual number of queues requested > against the maximum number of queues the driver can provide and use > that to set the real queues. > > And the return of a really big number from ops->get_num_rx_queues is > pretty ugly too, ideally that would be fixed to pass in some function > arguments and obtain the ppriv so it can return the actual maximum > number of queues and we don't waste a bunch of memory.. .get_num_rx_queues() is declared as void, so it can't have any complex logic except returns some global define. Thanks > > Jason