On Wed, May 08, 2019 at 10:30:28AM -0300, Jason Gunthorpe wrote: > On Wed, May 08, 2019 at 09:26:00AM +0300, Leon Romanovsky wrote: > > On Tue, May 07, 2019 at 01:13:04PM -0300, Jason Gunthorpe wrote: > > > On Mon, May 06, 2019 at 04:38:27PM -0400, Doug Ledford wrote: > > > > So, Jason and I were discussing the soft-iWARP driver submission, and he > > > > thought it would be good to know if it even works with the various iWARP > > > > hardware devices. I happen to have most of them on hand in one form or > > > > another, so I set down to test it. In the process, I ran across some > > > > issues just with the hardware versions themselves, let alone with soft- > > > > iWARP. So, here's the results of my matrix of tests. These aren't > > > > performance tests, just basic "does it work" smoke tests... > > > > > > Well, lets imagine to merge this at 5.2-rc1? > > > > Can we do something with kref in QPs and MRs before merging it? > > > > I'm super worried that memory model and locking used in this driver > > won't allow me to continue with allocation patches? > > Well, this use of idr doesn't look right to me: > > static inline struct siw_qp *siw_qp_id2obj(struct siw_device *sdev, int id) > { > struct siw_qp *qp = idr_find(&sdev->qp_idr, id); > > if (likely(qp && kref_get_unless_zero(&qp->ref))) > return qp; > > kref_get_unless_zero is nonsense unless used with someting like rcu, > and there is no rcu read lock here. > > Also, IDR's have to be locked.. > > It probably wants to be written as > > xa_lock() > qp = xa_load() > if (qp) > kref_get(&qp->ref); > xa_unlock() > > But I'm not completely sure what this is all about.. A QP cannot > really exist past destroy - about the only thing that would make sense > is to leave some memory around so other things can see it is failed - > but generally it is better to wipe out the QP from those other things > then attempt to do reference counting like this. No, no,, no, it is still not enough. I need to be sure that destroy path always successes and kref_get(&qp->ref) doesn't guarantee that. The good coding pattern can be seen in rdmavt https://elixir.bootlin.com/linux/latest/source/drivers/infiniband/sw/rdmavt/cq.c#L316 They krefing and releasing extra structure outside of user visible object. > > The only thing that seems to need this is the siw_cep, and I'm not > sure what this object is about or how it should function if the QP is > destroyed. > > Jason