On Tue, May 28, 2024 at 01:10:00PM +0200, Zhu Yanjun wrote: > On 24.05.24 03:52, Honggang LI wrote: > > On Thu, May 23, 2024 at 05:03:12PM +0200, Zhu Yanjun wrote: > > > Subject: Re: [PATCH] RDMA/rxe: Fix responder length checking for UD request > > > packets > > > From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx> > > > Date: Thu, 23 May 2024 17:03:12 +0200 > > > > > > > > > On 23.05.24 14:06, Zhu Yanjun wrote: > > > > > > > > On 23.05.24 11:46, Honggang LI wrote: > > > > > According to the IBA specification: > > > > > If a UD request packet is detected with an invalid length, the request > > > > > shall be an invalid request and it shall be silently dropped by > > > > > the responder. The responder then waits for a new request packet. > > > > > > > > > > commit 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length > > > > > checking") > > > > > defers responder length check for UD QPs in function `copy_data`. > > > > > But it introduces a regression issue for UD QPs. > > > > > > > > > > When the packet size is too large to fit in the receive buffer. > > > > > `copy_data` will return error code -EINVAL. Then `send_data_in` > > > > > will return RESPST_ERR_MALFORMED_WQE. UD QP will transfer into > > > > > ERROR state. > > > > > > > > > > Fixes: 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length > > > > > checking") > > > > > Signed-off-by: Honggang LI <honggangli@xxxxxxx> > > > > > --- > > > > > drivers/infiniband/sw/rxe/rxe_resp.c | 12 ++++++++++++ > > > > > 1 file changed, 12 insertions(+) > > > > > > > > > > diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c > > > > > b/drivers/infiniband/sw/rxe/rxe_resp.c > > > > > index 963382f625d7..a74f29dcfdc9 100644 > > > > > --- a/drivers/infiniband/sw/rxe/rxe_resp.c > > > > > +++ b/drivers/infiniband/sw/rxe/rxe_resp.c > > > > > @@ -354,6 +354,18 @@ static enum resp_states > > > > > rxe_resp_check_length(struct rxe_qp *qp, > > > > > * receive buffer later. For rmda operations additional > > > > > * length checks are performed in check_rkey. > > > > > */ > > > > > + if ((qp_type(qp) == IB_QPT_GSI) || (qp_type(qp) == IB_QPT_UD)) { > > > > > > > > From IBA specification: > > > > > > > > " > > > > > > > > QP1, used for the General Services Interface (GSI). > > > > •This QP uses the Unreliable Datagram transport service. > > > > •All traffic to and from this QP uses any VL other than VL15. > > > > •GSI packets arriving before the current packet’s command completes may > > > > be dropped (i.e. the minimum queue depth of QP1 is one). > > > > > > > > " > > > > > > > > GSI should be MAD packets. And it should have a fixed format. Not sure > > > > if the payload of GSI packets will exceed the size of the recv buffer. > > > > It's dangerous to trust remote GSI request packets always fit in local > > receive buffer. A well-designed hostile GSI request packet can render > > remote QP1 into ERROR state. That means the remote node can't establish > > new RC QP connections. > > Thanks, Honggang. > Based on our discussion, this seems to be a security problem. It seems that > this problem is related with MLX5. Before MLX5 engineers jump into this > problem, to RXE, this commit can avoid RXE hang in ERROR state. Current RDMA network is designed with assumption that all participants are trusted. Thanks > > LGTM. > > Zhu Yanjun > > > > > Thanks > > >