On Thu, 2018-12-06 at 16:01 -0500, Doug Ledford wrote: > On Thu, 2018-12-06 at 20:58 +0000, Parav Pandit wrote: > > > -----Original Message----- > > > From: Doug Ledford <dledford@xxxxxxxxxx> > > > Sent: Thursday, December 6, 2018 2:55 PM > > > To: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky > > > <leon@xxxxxxxxxx> > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang > > > <markz@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx> > > > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master filter for > > > closed slaves > > > > > > On Thu, 2018-12-06 at 19:10 +0000, Parav Pandit wrote: > > > > > -----Original Message----- > > > > > From: Leon Romanovsky <leon@xxxxxxxxxx> > > > > > Sent: Thursday, December 6, 2018 12:48 PM > > > > > To: Doug Ledford <dledford@xxxxxxxxxx> > > > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang > > > > > <markz@xxxxxxxxxxxx>; RDMA mailing list > > > > > <linux-rdma@xxxxxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx> > > > > > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master filter > > > > > for closed slaves > > > > > > > > > > On Thu, Dec 06, 2018 at 12:19:30PM -0500, Doug Ledford wrote: > > > > > > On Wed, 2018-12-05 at 15:50 +0200, Leon Romanovsky wrote: > > > > > > > From: Mark Zhang <markz@xxxxxxxxxxxx> > > > > > > > > > > > > > > The "modprobe -r mlx4_en" will close bond slaves and in such > > > > > > > case the rdma_dev would be NULL, so no match should be returned. > > > > > > > > > > > > I'm not groking your commit message. What happens without this > > > patch? > > > > > We will see calltrace due to the fact that NULL is passed down the > > > > > stack through rdma_is_upper_dev_rcu() upto > > > netdev_walk_all_upper_dev_rcu(). > > > > I see this patch being sent for -rc which I was not sure when I reviewed. > > > > Commit message needs to change as below. > > > > > > > > When mlx4_en driver is unloaded, and configured to use bond devices, > > > get_netdev() callback returns NULL because associated netdev is > > > unregistered. > > > > Therefore, similar to other filter() routines, fix > > > is_upper_ndev_bond_master_filter() to handle NULL netdev. > > > > Without this check, a call trace is observed. > > > > > > Does it catch a WARN_ON later on, or does it oops? > > > > > It is NULL reference oops. > > Yep, I just got through following the trail. It oopses in > netdev_next_upper_dev_rcu(). > > Ok, patch applied to for-rc with commit message fixups. > I didn't like the commit message as it stood, so I changed it to this. Let me know if there are any objections, it's still changeable as long as it's in my wip branch: IB/core: Fix oops in netdev_next_upper_dev_rcu() On removal of the mlx4_en module, the proper order of removal is to remove the roce devices, and then remove the net devices the roce devices were paired on top of. If the netdevice happens to be part of a bond, then during netdevice removal, after the roce device is already gone, a call to is_upper_ndev_bond_master_filter() is made. We can't allow the slave netdevice to succeed at the match if the rdma_dev is already gone, otherwise the mentioned oops occurs. Fixes: 408f1242d940 ("IB/core: Delete lower netdevice default GID entries in bonding scenario") Signed-off-by: Mark Zhang <markz@xxxxxxxxxxxx> Reviewed-by: Parav Pandit <parav@xxxxxxxxxxxx> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part