On Wed, May 06, 2020 at 03:09:36PM -0300, Jason Gunthorpe wrote: > On Wed, May 06, 2020 at 07:56:08PM +0300, Leon Romanovsky wrote: > > On Wed, May 06, 2020 at 11:43:44AM -0300, Jason Gunthorpe wrote: > > > On Wed, May 06, 2020 at 08:32:13AM +0300, Leon Romanovsky wrote: > > > > From: Jack Morgenstein <jackm@xxxxxxxxxxxxxxxxxx> > > > > > > > > The IB core pkey cache is populated by procedure ib_cache_update(). > > > > Initially, the pkey cache pointer is NULL. ib_cache_update allocates > > > > a buffer and populates it with the device's pkeys, via repeated calls > > > > to procedure ib_query_pkey(). > > > > > > > > If there is a failure in populating the pkey buffer via ib_query_pkey(), > > > > ib_cache_update does not replace the old pkey buffer cache with the > > > > updated one -- it leaves the old cache as is. > > > > > > > > Since initially the pkey buffer cache is NULL, when calling > > > > ib_cache_update the first time, a failure in ib_query_pkey() will cause > > > > the pkey buffer cache pointer to remain NULL. > > > > > > > > In this situation, any calls subsequent to ib_get_cached_pkey(), > > > > ib_find_cached_pkey(), or ib_find_cached_pkey_exact() will try to > > > > dereference the NULL pkey cache pointer, causing a kernel panic. > > > > > > > > Fix this by checking the ib_cache_update() return value. > > > > > > > > Fixes: 8faea9fd4a39 ("RDMA/cache: Move the cache per-port data into the main ib_port_data") > > > > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > > > > Signed-off-by: Jack Morgenstein <jackm@xxxxxxxxxxxxxxxxxx> > > > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > > > Changelog: > > > > v1: I rewrote the patch to take care of ib_cache_update() return value. > > > > v0: https://lore.kernel.org/linux-rdma/20200426075811.129814-1-leon@xxxxxxxxxx > > > > drivers/infiniband/core/cache.c | 11 +++++++++-- > > > > 1 file changed, 9 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c > > > > index 717b798cddad..1cbebfa374a5 100644 > > > > +++ b/drivers/infiniband/core/cache.c > > > > @@ -1553,10 +1553,17 @@ int ib_cache_setup_one(struct ib_device *device) > > > > if (err) > > > > return err; > > > > > > > > - rdma_for_each_port (device, p) > > > > - ib_cache_update(device, p, true); > > > > + rdma_for_each_port (device, p) { > > > > + err = ib_cache_update(device, p, true); > > > > + if (err) > > > > + goto out; > > > > + } > > > > > > > > return 0; > > > > + > > > > +out: > > > > + ib_cache_release_one(device); > > > > + return err; > > > > > > ib_cache_release_once can be called only once, and it is always called > > > by ib_device_release(), it should not be called here > > > > It doesn't sound right if we rely on ib_device_release() to unwind error > > in ib_cache_setup_one(). I don't think that we need to return from > > ib_cache_setup_one() without cleaning it. > > We do as ib_cache_release_one() cannot be called multiple times Do you want me to respin? > > The general design of all this pre-registration stuff is that the > release function does the clean up and the individual functions should > not error unwind cleanup done in the unconditional release. > > Other schemes were too complicated It doesn't mean that it is right :) Thanks > > Jason