RE: [PATCH 2/2] RDMA/rxe: Handle remote errors in the midst of a Read reply sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 13, 2022 2:36 PM Li Zhijian wrote:
> On 13/10/2022 09:47, Daisuke Matsuda wrote:
> > Requesting nodes do not handle a reported error correctly if it is
> > generated in the middle of multi-packet Read responses, and the node tries
> > to resend the request endlessly. Let completer terminate the connection in
> > that case.
> >
> > Signed-off-by: Daisuke Matsuda <matsuda-daisuke@xxxxxxxxxxx>
> > ---
> > FOR REVIEWERS:
> >    I referred to IB Specification Vol 1-Revision-1.5 to make this patch.
> >    Please see Ch.9.9.2.2, Ch.9.9.2.4.2 and Table 59.
> >
> >   drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++++++
> >   1 file changed, 8 insertions(+)
> >
> > diff --git a/drivers/infiniband/sw/rxe/rxe_comp.c b/drivers/infiniband/sw/rxe/rxe_comp.c
> > index fb0c008af78c..c9170dd99f3a 100644
> > --- a/drivers/infiniband/sw/rxe/rxe_comp.c
> > +++ b/drivers/infiniband/sw/rxe/rxe_comp.c
> > @@ -200,6 +200,10 @@ static inline enum comp_state check_psn(struct rxe_qp *qp,
> >   		 */
> >   		if (pkt->psn == wqe->last_psn)
> >   			return COMPST_COMP_ACK;
> > +		else if (pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE &&
> > +			 (qp->comp.opcode == IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST ||
> > +			  qp->comp.opcode == IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE))

Hi Zhijian,
Thank you for taking a look.

> When IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST or IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE will
> be assigned to qp->comp.opcode ?
This happens in rxe_completer(). Upon receiving a Read reply, 'state' is changed in the following order: 
[  101.215593] rdma_rxe: qp#17 state = GET ACK
[  101.216174] rdma_rxe: qp#17 state = GET WQE
[  101.216776] rdma_rxe: qp#17 state = CHECK PSN
[  101.217384] rdma_rxe: qp#17 state = CHECK ACK
[  101.218008] rdma_rxe: qp#17 state = READ
[  101.218556] rdma_rxe: qp#17 state = UPDATE COMP
[  101.219188] rdma_rxe: qp#17 state = DONE
'qp->comp.opcode' is filled at COMPST_UPDATE_COMP, and the value is retained
until it is overridden by the next incoming reply.

> I wonder if "(pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE) "  is enough ?
I am not very sure if it is better. I agree your suggestion is correct within the
IB transport layer, but RoCEv2 traffic is encapsulated in UDP/IP headers.
We may need to take it into consideration that UDP does not guarantee ordered
delivery of packets.

For example, responder can return coalesced ACKs for RDMA Write requests.
If ACK packets are swapped in the course of delivery, a packet with older psn can
arrive later, and probably it is to be discarded by returning COMPST_DONE here.

If we just put "(pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE) ",
I am afraid the swapped ACKs may go into wrong path. This is based on my
speculation, so please let me know if it is wrong.

Thanks,
Daisuke

> 
> Thanks
> Zhijian
> 
> 
> > +			return COMPST_CHECK_ACK;
> >   		else
> >   			return COMPST_DONE;
> >   	} else if ((diff > 0) && (wqe->mask & WR_ATOMIC_OR_READ_MASK)) {
> > @@ -228,6 +232,10 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
> >
> >   	case IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST:
> >   	case IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE:
> > +		/* Check NAK code to handle a remote error */
> > +		if (pkt->opcode == IB_OPCODE_RC_ACKNOWLEDGE)
> > +			break;
> > +
> >   		if (pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE &&
> >   		    pkt->opcode != IB_OPCODE_RC_RDMA_READ_RESPONSE_LAST) {
> >   			/* read retries of partial data may restart from





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux