On 07/09/2022 18:18, Christoph Hellwig wrote:
External email: Use caution opening links or attachments
On Wed, Sep 07, 2022 at 06:16:05PM +0300, Sagi Grimberg wrote:
This entire code needs to move to the rdma core instead
of being leaked to ulps.
We can move, but you will lose connection between queue number,
caller and error itself.
That still doesn't explain why nvme-rdma is special.
In any event, the ulp can log the qpn so the context can be interrogated
if that is important.
I also don't see why the QP event handler can't be called
from user context to start with. I see absolutely no reason to
add boilerplate code to drivers for reporting slighly more verbose
errors on one specific piece of hrdware. I'd say clean up the mess
that is the QP event handler first, and then once error reporting
becomes trivial we can just do it.
I would like to emphasize that it is not just about slightly more
verbose error, but mainly it is about an error that wouldn't have been
reported at all without this feature, as I previously mentioned error
cases in which the remote side doesn't generate a CQE, the remote side
wouldn't even know why the QP was moved to error state without this feature.