Re: XRC Questions for Mellanox and All

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 19 Nov 2017 12:02:10 +0000
"Amrani, Ram" <Ram.Amrani@xxxxxxxxxx> wrote:


Hi Ram,
I apologize for the delay in responding.

The Linux XRC implementation does not provide XRC specific errors.
Rather, these errors are perceived as local access violation errors at
the responder. Consequently, at the responder side, for both XRCETH and
XRC domain violation errors, a local access violation asynchronous
error is reported (IB_EVENT_QP_ACCESS_ERR). The remote QP then
transitions to the ERR state.

At the requester side, these errors are perceived as Remote Invalid
Request errors, and generate IB_WC_REM_INV_REQ_ERR completions. The
local QP then transitions to the ERR state.

Operationally, then, the resources behave as they should upon XRC
errors.

On the responder side, XRC receive completions report the XRC_TGT_QP in
the “qp” field of the ib_wc structure;  the XRC SRQ number is returned
in field “src_qpn”.

(The src_qp field was previously used only for UD qps; there was no
reason not to use it to return the XRC SRQ number, especially as doing
this meant no need to add a new field to struct ib_wc).

-Jack

> Hi Leon, Mellanox,
> Can you help get a comment on this e-mail?
> 
> (Added Sean and Roland as by git log they seem related too)
> 
> Thanks,
> Ram
> 
> 
> > -----Original Message-----
> > [This sender failed our fraud detection checks and may not be who
> > they appear to be. Learn about spoofing at
> > http://aka.ms/LearnAboutSpoofing]
> > 
> > Hi Mellanox, All,
> > I've been reading XRC code, currently implemented only by Mellanox.
> > I have a few questions regarding specifications vs. implementation.
> > 
> >  (1) The protocol specifies:
> > 11.6.3.2 AFFILIATED ASYNCHRONOUS ERRORS
> > ...
> > The following describes the new Affiliated Asynchronous Errors for
> > XRC TGT QPs:
> > * XRC Domain Violation - Responder's Receive Queue detected an
> > XRC Domain that does not match the XRC Domain of the XRC SRQ.
> > * Invalid XRCETH - Responder detected that the XRC SRQ does not
> > exist or is not in the right state or wire protocol violation.
> > 
> > I don't see any dedicated entries in the enum ib_event_type.
> > Why? How, do you currently treat these errors?
> > 
> > (2) The protocol specifies:
> > 11.4.2.1 POLL FOR COMPLETION
> > ...
> > Output Modifiers:
> > ...
> > * Local XRC TGT QP Number. Returned only for completions
> > of WRs posted to XRC SRQs.
> > 
> > I don't see any dedicated field in the struct ib_wc.
> > Why? How, do you currently return this value, if at all?
> > 
> > (3) The protocol specifies:
> > 11.4.2.1 POLL FOR COMPLETION
> > A new "XRC violation error" is returned for requests that caused
> > the responder to return a "NAK-Invalid RD Request" NAK. This could
> > have been caused by either a Remote XRC Domain Violation or an
> > XRCETH Violation as detailed in the transport section.
> > 
> > What entry from the enum ib_wc_status do you use for this?
> > 
> > Thanks,
> > Ram
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html  
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma"
> in the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux