On Wed, May 06, 2020 at 11:43:44AM -0300, Jason Gunthorpe wrote: > On Wed, May 06, 2020 at 08:32:13AM +0300, Leon Romanovsky wrote: > > From: Jack Morgenstein <jackm@xxxxxxxxxxxxxxxxxx> > > > > The IB core pkey cache is populated by procedure ib_cache_update(). > > Initially, the pkey cache pointer is NULL. ib_cache_update allocates > > a buffer and populates it with the device's pkeys, via repeated calls > > to procedure ib_query_pkey(). > > > > If there is a failure in populating the pkey buffer via ib_query_pkey(), > > ib_cache_update does not replace the old pkey buffer cache with the > > updated one -- it leaves the old cache as is. > > > > Since initially the pkey buffer cache is NULL, when calling > > ib_cache_update the first time, a failure in ib_query_pkey() will cause > > the pkey buffer cache pointer to remain NULL. > > > > In this situation, any calls subsequent to ib_get_cached_pkey(), > > ib_find_cached_pkey(), or ib_find_cached_pkey_exact() will try to > > dereference the NULL pkey cache pointer, causing a kernel panic. > > > > Fix this by checking the ib_cache_update() return value. > > > > Fixes: 8faea9fd4a39 ("RDMA/cache: Move the cache per-port data into the main ib_port_data") > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > Signed-off-by: Jack Morgenstein <jackm@xxxxxxxxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > --- > > Changelog: > > v1: I rewrote the patch to take care of ib_cache_update() return value. > > v0: https://lore.kernel.org/linux-rdma/20200426075811.129814-1-leon@xxxxxxxxxx > > --- > > drivers/infiniband/core/cache.c | 11 +++++++++-- > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > -- > > 2.26.2 > > > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > > index 717b798cddad..1cbebfa374a5 100644 > > --- a/drivers/infiniband/core/cache.c > > +++ b/drivers/infiniband/core/cache.c > > @@ -1553,10 +1553,17 @@ int ib_cache_setup_one(struct ib_device *device) > > if (err) > > return err; > > > > - rdma_for_each_port (device, p) > > - ib_cache_update(device, p, true); > > + rdma_for_each_port (device, p) { > > + err = ib_cache_update(device, p, true); > > + if (err) > > + goto out; > > + } > > > > return 0; > > + > > +out: > > + ib_cache_release_one(device); > > + return err; > > ib_cache_release_once can be called only once, and it is always called > by ib_device_release(), it should not be called here It doesn't sound right if we rely on ib_device_release() to unwind error in ib_cache_setup_one(). I don't think that we need to return from ib_cache_setup_one() without cleaning it. Thanks > > Jason