On 24.05.24 03:52, Honggang LI wrote:
On Thu, May 23, 2024 at 05:03:12PM +0200, Zhu Yanjun wrote:
Subject: Re: [PATCH] RDMA/rxe: Fix responder length checking for UD request
packets
From: Zhu Yanjun <yanjun.zhu@xxxxxxxxx>
Date: Thu, 23 May 2024 17:03:12 +0200
On 23.05.24 14:06, Zhu Yanjun wrote:
On 23.05.24 11:46, Honggang LI wrote:
According to the IBA specification:
If a UD request packet is detected with an invalid length, the request
shall be an invalid request and it shall be silently dropped by
the responder. The responder then waits for a new request packet.
commit 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length
checking")
defers responder length check for UD QPs in function `copy_data`.
But it introduces a regression issue for UD QPs.
When the packet size is too large to fit in the receive buffer.
`copy_data` will return error code -EINVAL. Then `send_data_in`
will return RESPST_ERR_MALFORMED_WQE. UD QP will transfer into
ERROR state.
Fixes: 689c5421bfe0 ("RDMA/rxe: Fix incorrect responder length
checking")
Signed-off-by: Honggang LI <honggangli@xxxxxxx>
---
drivers/infiniband/sw/rxe/rxe_resp.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c
b/drivers/infiniband/sw/rxe/rxe_resp.c
index 963382f625d7..a74f29dcfdc9 100644
--- a/drivers/infiniband/sw/rxe/rxe_resp.c
+++ b/drivers/infiniband/sw/rxe/rxe_resp.c
@@ -354,6 +354,18 @@ static enum resp_states
rxe_resp_check_length(struct rxe_qp *qp,
* receive buffer later. For rmda operations additional
* length checks are performed in check_rkey.
*/
+ if ((qp_type(qp) == IB_QPT_GSI) || (qp_type(qp) == IB_QPT_UD)) {
From IBA specification:
"
QP1, used for the General Services Interface (GSI).
•This QP uses the Unreliable Datagram transport service.
•All traffic to and from this QP uses any VL other than VL15.
•GSI packets arriving before the current packet’s command completes may
be dropped (i.e. the minimum queue depth of QP1 is one).
"
GSI should be MAD packets. And it should have a fixed format. Not sure
if the payload of GSI packets will exceed the size of the recv buffer.
It's dangerous to trust remote GSI request packets always fit in local
receive buffer. A well-designed hostile GSI request packet can render
remote QP1 into ERROR state. That means the remote node can't establish
new RC QP connections.
Thanks, Honggang.
Based on our discussion, this seems to be a security problem. It seems
that this problem is related with MLX5. Before MLX5 engineers jump into
this problem, to RXE, this commit can avoid RXE hang in ERROR state.
LGTM.
Zhu Yanjun
Thanks