RE: [PATCH rdma-rc] IB/core: Don't match bond master filter for closed slaves

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Doug Ledford <dledford@xxxxxxxxxx>
> Sent: Thursday, December 6, 2018 3:12 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky
> <leon@xxxxxxxxxx>
> Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang
> <markz@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master filter for
> closed slaves
> 
> On Thu, 2018-12-06 at 16:01 -0500, Doug Ledford wrote:
> > On Thu, 2018-12-06 at 20:58 +0000, Parav Pandit wrote:
> > > > -----Original Message-----
> > > > From: Doug Ledford <dledford@xxxxxxxxxx>
> > > > Sent: Thursday, December 6, 2018 2:55 PM
> > > > To: Parav Pandit <parav@xxxxxxxxxxxx>; Leon Romanovsky
> > > > <leon@xxxxxxxxxx>
> > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang
> > > > <markz@xxxxxxxxxxxx>; RDMA mailing list
> > > > <linux-rdma@xxxxxxxxxxxxxxx>
> > > > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master
> > > > filter for closed slaves
> > > >
> > > > On Thu, 2018-12-06 at 19:10 +0000, Parav Pandit wrote:
> > > > > > -----Original Message-----
> > > > > > From: Leon Romanovsky <leon@xxxxxxxxxx>
> > > > > > Sent: Thursday, December 6, 2018 12:48 PM
> > > > > > To: Doug Ledford <dledford@xxxxxxxxxx>
> > > > > > Cc: Jason Gunthorpe <jgg@xxxxxxxxxxxx>; Mark Zhang
> > > > > > <markz@xxxxxxxxxxxx>; RDMA mailing list
> > > > > > <linux-rdma@xxxxxxxxxxxxxxx>; Parav Pandit
> > > > > > <parav@xxxxxxxxxxxx>
> > > > > > Subject: Re: [PATCH rdma-rc] IB/core: Don't match bond master
> > > > > > filter for closed slaves
> > > > > >
> > > > > > On Thu, Dec 06, 2018 at 12:19:30PM -0500, Doug Ledford wrote:
> > > > > > > On Wed, 2018-12-05 at 15:50 +0200, Leon Romanovsky wrote:
> > > > > > > > From: Mark Zhang <markz@xxxxxxxxxxxx>
> > > > > > > >
> > > > > > > > The "modprobe -r mlx4_en" will close bond slaves and in
> > > > > > > > such case the rdma_dev would be NULL, so no match should be
> returned.
> > > > > > >
> > > > > > > I'm not groking your commit message.  What happens without
> > > > > > > this
> > > > patch?
> > > > > > We will see calltrace due to the fact that NULL is passed down
> > > > > > the stack through rdma_is_upper_dev_rcu() upto
> > > > netdev_walk_all_upper_dev_rcu().
> > > > > I see this patch being sent for -rc which I was not sure when I
> reviewed.
> > > > > Commit message needs to change as below.
> > > > >
> > > > > When mlx4_en driver is unloaded, and configured to use bond
> > > > > devices,
> > > > get_netdev() callback returns NULL because associated netdev is
> > > > unregistered.
> > > > > Therefore, similar to other filter() routines, fix
> > > > is_upper_ndev_bond_master_filter() to handle NULL netdev.
> > > > > Without this check, a call trace is observed.
> > > >
> > > > Does it catch a WARN_ON later on, or does it oops?
> > > >
> > > It is NULL reference oops.
> >
> > Yep, I just got through following the trail.  It oopses in
> > netdev_next_upper_dev_rcu().
> >
> > Ok, patch applied to for-rc with commit message fixups.
> >
> 
> I didn't like the commit message as it stood, so I changed it to this.
> Let me know if there are any objections, it's still changeable as long as it's in
> my wip branch:
> 
> IB/core: Fix oops in netdev_next_upper_dev_rcu()
> 
>     On removal of the mlx4_en module, the proper order of removal is to
>     remove the roce devices, and then remove the net devices the roce
In current code, mlx4_ib and mlx4_en are layered on top of mlx4_core.
So mlx4_en can get unloaded even with roce devices created by mlx4_ib exist.
This leads to condition where get_netdev() returns NULL.

So we should modify the commit message a bit.
How about below text,

In bonding configuration of netdevices for using mlx4_en and mlx4_ib in dual port mode,
get_netdev() returns NULL when associated netdevice is unregistered for a port of rdma device.
A filter function is_upper_ndev_bond_master_filter () needs to consider NULL return value by get_netdev().
Otherwise, it results into a NULL reference oops.

To avoid such oops, perform NULL reference check.

>     devices were paired on top of.  If the netdevice happens to be part
>     of a bond, then during netdevice removal, after the roce device is
>     already gone, a call to is_upper_ndev_bond_master_filter() is made.
>     We can't allow the slave netdevice to succeed at the match if the
>     rdma_dev is already gone, otherwise the mentioned oops occurs.
> 
>     Fixes: 408f1242d940 ("IB/core: Delete lower netdevice default GID entries
> in bonding scenario")
>     Signed-off-by: Mark Zhang <markz@xxxxxxxxxxxx>
>     Reviewed-by: Parav Pandit <parav@xxxxxxxxxxxx>
>     Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
>     Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>
> 
> 
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux