RE: [PATCH for-next 16/16] IB/ipoib: Fix for potential no-carrier state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Doug Ledford [mailto:dledford@xxxxxxxxxx]
> Sent: Friday, January 26, 2018 12:00 PM
> To: Dalessandro, Dennis <dennis.dalessandro@xxxxxxxxx>; jgg@xxxxxxxx
> Cc: linux-rdma@xxxxxxxxxxxxxxx; Marciniszyn, Mike <mike.marciniszyn@xxxxxxxxx>;
> Weiny, Ira <ira.weiny@xxxxxxxxx>; Estrin, Alex <alex.estrin@xxxxxxxxx>
> Subject: Re: [PATCH for-next 16/16] IB/ipoib: Fix for potential no-carrier state
> 
> On Fri, 2018-01-26 at 06:33 -0800, Dennis Dalessandro wrote:
> > From: Alex Estrin <alex.estrin@xxxxxxxxx>
> >
> > On reboot SM can program port pkey table before ipoib registered its
> > event handler, which could result in missing pkey event and leave root
> > interface with initial pkey value from index 0.
> >
> > Since OPA port starts with invalid pkey in index 0, root interface will
> > fail to initialize and stay down with no-carrier flag.
> >
> > For IB ipoib interface may end up with pkey different from value
> > opensm put in pkey table idx 0, resulting in connectivity issues
> > (different mcast groups, for example).
> >
> > Close the window by calling event handler after registration
> > to make sure ipoib pkey is in sync with port pkey table.
> >
> > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx>
> > Reviewed-by: Ira Weiny <ira.weiny@xxxxxxxxx>
> > Signed-off-by: Alex Estrin <alex.estrin@xxxxxxxxx>
> > Signed-off-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>
> > ---
> >  drivers/infiniband/ulp/ipoib/ipoib_main.c |    3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > index 5930c7d..161ba8c 100644
> > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> > @@ -2306,6 +2306,9 @@ void ipoib_set_dev_features(struct ipoib_dev_priv
> *priv, struct ib_device *hca)
> >  			      priv->ca, ipoib_event);
> >  	ib_register_event_handler(&priv->event_handler);
> >
> > +	/* call event handler to ensure pkey in sync */
> > +	queue_work(ipoib_workqueue, &priv->flush_heavy);
> > +
> 
> This seems like a bit of a sledgehammer to the issue.  Looking through
> ipoib_add_port(), the real race is that we have to call ib_query_pkey()
> early in the init sequence as some of the later steps need it to be set
> (ipoib_dev_init() must have it already set for one), but since we don't
> setup our event handler until after we've finished setting up the
> device, there is that window from our first ib_query_pkey call until we
> complete the ib_register_event_handler() call for the pkey to change.
> Instead of throwing the flush regardless, it might be nicer to do:
> 
> 	{
> 		u16 new_pkey;
> 
> 		ib_query_pkey(hca, port, 0, &new_pkey);
> 		if (priv->pkey != (new_pkey | 0x8000))
> 			/* The pkey changed between when we
> 			 * read it and now, flush the device
> 			 */
> 			queue_work(ipoib_workqueue, &priv->flush_heavy);
> 	}

Hi Doug,
The reason I did not go this way is that at this early point of initialization 
pkey handler will not operate as a "sledgehammer" to flush all cached records, 
it will  query pkey, update if changed, then exit early.
In any case we will query pkey only once, not twice as might be in your case.

Thanks,
Alex.
 
> 
> >  	result = register_netdev(priv->dev);
> >  	if (result) {
> >  		pr_warn("%s: couldn't register ipoib port %d; error %d\n",
> >
> 
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
>     Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux