On 1/21/21 4:06 PM, Bob Pearson wrote: > On 1/20/21 6:20 PM, Jason Gunthorpe wrote: >> On Tue, Jan 19, 2021 at 03:49:47PM -0600, Bob Pearson wrote: >> >>> read_lock_irqsave(&pool->pool_lock, flags); >>> - obj = rxe_alloc_nl(pool); >>> + if (pool->state != RXE_POOL_STATE_VALID) { >>> + read_unlock_irqrestore(&pool->pool_lock, flags); >>> + return NULL; >>> + } >>> + >>> + kref_get(&pool->ref_cnt); >>> read_unlock_irqrestore(&pool->pool_lock, flags); >> >> What is this lock actually protecting?? Every data read under it is not >> written under the lock?? Even the memory storing the lock could be kfreed. >> >> This should all be fixed by relying on the ib_device kref. Today rxe >> only frees the pools from ops->dealloc_driver which can only happen >> once the ib_device_try_get() is all put back >> >> So the structure you want is to never call rxe_alloc from anything >> that is not holding the ib_device_try(), or implicitly under a client. >> >> Basically this is already everything in ops - so things work >> queues/etc internal to rxe need attention. >> >> I'm not completely sure why rxe_alloc does a ib_device_try_get() >> either. Every verbs object is also guarneteed to be cleaned up before >> dealloc_driver is called. >> >> Most likely thing is all of this can just be deleted. >> >> Jason >> > > The pool lock will only get freed if the rxe device struct gets freed and that is > after the end of this world. > > I had no idea what ib_device_try_get() was about so I didn't touch it. > I don't believe that rxe_alloc() ever gets called except from verbs/ops. None of > the worker threads ever create new objects. So I will delete those ..._get ..._put. > > If you are right then pool->state is not needed either and then the locking is not > needed for alloc for races with shutdown. Will still want the lock for things like > the mcast bug where we only want to create one instance of an object or two threads > looking up an object from a RB tree. > > bob > [fixed typo in address] On looking some more rxe_alloc() takes a reference on the pool for each new object and also takes a reference on the ib_device. These don't really add much value since the pool references are enough to catch counting bugs.