On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote: > > > > > From: Israel Rukshin <israelr@xxxxxxxxxx> > > > > > > > > Add debug prints for fatal QP events that are helpful for finding the > > > > root cause of the errors. The ib_get_qp_err_syndrome is called at > > > > a work queue since the QP event callback is running on an > > > > interrupt context that can't sleep. > > > > > > > > Signed-off-by: Israel Rukshin <israelr@xxxxxxxxxx> > > > > Reviewed-by: Max Gurtovoy <mgurtovoy@xxxxxxxxxx> > > > > Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxx> > > > > > > What makes nvme-rdma special here? Why do you get this in > > > nvme-rdma and not srp/iser/nfs-rdma/rds/smc/ipoib etc? > > > > > > This entire code needs to move to the rdma core instead > > > of being leaked to ulps. > > > > We can move, but you will lose connection between queue number, > > caller and error itself. > > That still doesn't explain why nvme-rdma is special. It was important for us to get proper review from at least one ULP, nvme-rdma is not special at all. > > In any event, the ulp can log the qpn so the context can be interrogated > if that is important. ok