RE: [PATCH rdma-next 3/7] IB/core: Simplify ib_query_gid to always refer to cache

Parav Pandit <parav@xxxxxxxxxxxx> · Fri, 30 Mar 2018 00:03:54 +0000



> -----Original Message-----
> From: Hefty, Sean [mailto:sean.hefty@xxxxxxxxx]
> Sent: Thursday, March 29, 2018 5:29 PM
> To: Jason Gunthorpe <jgg@xxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Mark Bloch <markb@xxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx>
> Subject: RE: [PATCH rdma-next 3/7] IB/core: Simplify ib_query_gid to always
> refer to cache
> 
> > On Wed, Mar 28, 2018 at 03:43:40PM +0300, Leon Romanovsky wrote:
> > > From: Parav Pandit <parav@xxxxxxxxxxxx>
> > >
> > > Currently following inconsistencies exist.
> > > 1. ib_query_gid() returns GID from the software cache for a RoCE
> > port
> > > and returns GID from the HCA for an IB port.
> > > This is incorrect because software GID cache is maintained
> > regardless
> > > of HCA port type.
> > >
> > > 2. GID is queries from the HCA via ib_query_gid and updated in the
> > > software cache for IB link layer. Both of them might not be in
> > sync.
> > >
> > > ULPs such as SRP initiator, SRP target, IPoIB driver have
> > historically
> > > used ib_query_gid() API to query the GID. However CM used cached
> > > version during CM processing, When software cache was introduced,
> > this
> > > inconsitency remained.
> > >
> > > In order to simplify, improve readability and avoid link layer
> > > specific above inconsistencies, this patch brings following
> > changes.
> > >
> > > 1. ib_query_gid() always refers to the cache layer regardless of
> > link
> > > layer.
> > >
> > > 2. cache module who reads the GID entry from HCA and builds the
> > cache,
> > > directly invokes the HCA provider verb's query_gid() callback
> > function.
> > >
> > > 3. ib_query_port() is being called in early stage where GID cache
> > is
> > > not yet build while reading port immutable property. Therefore it
> > > needs to read the default GID from the HCA for IB link layer to
> > > publish the subnet prefix.
> > >
> > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > > ---
> > >  drivers/infiniband/core/cache.c  |  4 ++--
> > > drivers/infiniband/core/device.c | 12 +++---------
> > >  2 files changed, 5 insertions(+), 11 deletions(-)
> >
> > This is quite a big thing to change.. I'd really like to see it done
> > though, the confusing API just does not seem necessary.
> >
> > I know the 'cache' was originally created because we needed to access
> > the GID entries without sleeping.
> 
> I think there was more than that.  I think Doug was needing to improve
> connection establishment times as well.  But maybe that was tied to pkeys or
> some other cached data.
> 
> > But why wasn't everything converted to use the cache? The only reason
> > I can think of is that there are situations where the cache does not
> > have new-enough data?
> 
> I think this was the case, but I can't remember the details.  And I haven't been
> able to find any emails or patches to refresh my memory. :/
> 
> > Ie a MAD updated the HCA and now have some race between cache and HCA?
> >
> > Is that possible? I'm nervous. Do you think it would make sense to add
> > some kind of 'force refresh' flag to this API and set it for all the
> > places that currently don't use the cache API? 'force refresh'
> > would guarentee that the latest data from the IB HCA is returned as we
> > have today, but otherwise still present the common cache API, and be a
> > NOP flag for roce?
> >
> > Sean: What do you think? Can you remember that far back any better?
> 
> A single API with a refresh flag makes sense to me.  I'll try to remember more
> details, but I've been struggling so far.  I think Roland was wanting to remove
> the caches at one point, but I don't remember the reason.
> 

ULP passing flag and doing guess work to pass flag or not just doesn't sound correct to me.
Additionally just because the caller is calling from non-blocking context, therefore it should call cached version doesn't sound correct.
Cache update fails handling is the role of core to retry the work when it is failed. It is not the ULP to worry and translate that into query API.
Uniform API among link layers is first step to get things right.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html