> -----Original Message----- > From: Doug Ledford <dledford@xxxxxxxxxx> > Sent: Thursday, December 6, 2018 3:12 PM > To: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky > <leon@xxxxxxxxxx> > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang > <markz@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master filter for > closed slaves > > On Thu, 2018-12-06 at 16:01 -0500, Doug Ledford wrote: > > On Thu, 2018-12-06 at 20:58 +0000, Parav Pandit wrote: > > > > -----Original Message----- > > > > From: Doug Ledford <dledford@xxxxxxxxxx> > > > > Sent: Thursday, December 6, 2018 2:55 PM > > > > To: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky > > > > <leon@xxxxxxxxxx> > > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang > > > > <markz@xxxxxxxxxxxx>; RDMA mailing list > > > > <linux-rdma@xxxxxxxxxxxxxxx> > > > > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master > > > > filter for closed slaves > > > > > > > > On Thu, 2018-12-06 at 19:10 +0000, Parav Pandit wrote: > > > > > > -----Original Message----- > > > > > > From: Leon Romanovsky <leon@xxxxxxxxxx> > > > > > > Sent: Thursday, December 6, 2018 12:48 PM > > > > > > To: Doug Ledford <dledford@xxxxxxxxxx> > > > > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang > > > > > > <markz@xxxxxxxxxxxx>; RDMA mailing list > > > > > > <linux-rdma@xxxxxxxxxxxxxxx>; Parav Pandit > > > > > > <parav@xxxxxxxxxxxx> > > > > > > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master > > > > > > filter for closed slaves > > > > > > > > > > > > On Thu, Dec 06, 2018 at 12:19:30PM -0500, Doug Ledford wrote: > > > > > > > On Wed, 2018-12-05 at 15:50 +0200, Leon Romanovsky wrote: > > > > > > > > From: Mark Zhang <markz@xxxxxxxxxxxx> > > > > > > > > > > > > > > > > The "modprobe -r mlx4_en" will close bond slaves and in > > > > > > > > such case the rdma_dev would be NULL, so no match should be > returned. > > > > > > > > > > > > > > I'm not groking your commit message. What happens without > > > > > > > this > > > > patch? > > > > > > We will see calltrace due to the fact that NULL is passed down > > > > > > the stack through rdma_is_upper_dev_rcu() upto > > > > netdev_walk_all_upper_dev_rcu(). > > > > > I see this patch being sent for -rc which I was not sure when I > reviewed. > > > > > Commit message needs to change as below. > > > > > > > > > > When mlx4_en driver is unloaded, and configured to use bond > > > > > devices, > > > > get_netdev() callback returns NULL because associated netdev is > > > > unregistered. > > > > > Therefore, similar to other filter() routines, fix > > > > is_upper_ndev_bond_master_filter() to handle NULL netdev. > > > > > Without this check, a call trace is observed. > > > > > > > > Does it catch a WARN_ON later on, or does it oops? > > > > > > > It is NULL reference oops. > > > > Yep, I just got through following the trail. It oopses in > > netdev_next_upper_dev_rcu(). > > > > Ok, patch applied to for-rc with commit message fixups. > > > > I didn't like the commit message as it stood, so I changed it to this. > Let me know if there are any objections, it's still changeable as long as it's in > my wip branch: > > IB/core: Fix oops in netdev_next_upper_dev_rcu() > > On removal of the mlx4_en module, the proper order of removal is to > remove the roce devices, and then remove the net devices the roce In current code, mlx4_ib and mlx4_en are layered on top of mlx4_core. So mlx4_en can get unloaded even with roce devices created by mlx4_ib exist. This leads to condition where get_netdev() returns NULL. So we should modify the commit message a bit. How about below text, In bonding configuration of netdevices for using mlx4_en and mlx4_ib in dual port mode, get_netdev() returns NULL when associated netdevice is unregistered for a port of rdma device. A filter function is_upper_ndev_bond_master_filter () needs to consider NULL return value by get_netdev(). Otherwise, it results into a NULL reference oops. To avoid such oops, perform NULL reference check. > devices were paired on top of. If the netdevice happens to be part > of a bond, then during netdevice removal, after the roce device is > already gone, a call to is_upper_ndev_bond_master_filter() is made. > We can't allow the slave netdevice to succeed at the match if the > rdma_dev is already gone, otherwise the mentioned oops occurs. > > Fixes: 408f1242d940 ("IB/core: Delete lower netdevice default GID entries > in bonding scenario") > Signed-off-by: Mark Zhang <markz@xxxxxxxxxxxx> > Reviewed-by: Parav Pandit <parav@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> > > > -- > Doug Ledford <dledford@xxxxxxxxxx> > GPG KeyID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD