On 8/1/2019 6:56 AM, Bernard Metzler wrote:
-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: -----
To: jgg@xxxxxxxx, bmt@xxxxxxxxxxxxxx
From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
Date: 07/31/2019 12:34PM
Cc: linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx,
nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx, "Krishnamraju Eraparaju"
<krishna2@xxxxxxxxxxx>
Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply handler tries to
read beyond MPA message
while processing MPA Reply, SIW driver is trying to read extra 4
bytes
than what peer has advertised as private data length.
If a FPDU data is received before even siw_recv_mpa_rr() completed
reading MPA reply, then ksock_recv() in siw_recv_mpa_rr() could also
read FPDU, if "size" is larger than advertised MPA reply length.
501 static int siw_recv_mpa_rr(struct siw_cep *cep)
502 {
.............
572
573 if (rcvd > to_rcv)
574 return -EPROTO; <----- Failure here
Looks like the intention here is to throw an ERROR if the received
data
is more than the total private data length advertised by the peer.
But
reading beyond MPA message causes siw_cm to generate
RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv buffer is
already
queued with FPDU messages.
Hence, this function should only read upto private data length.
Signed-off-by: Krishnamraju Eraparaju <krishna2@xxxxxxxxxxx>
---
drivers/infiniband/sw/siw/siw_cm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/sw/siw/siw_cm.c
b/drivers/infiniband/sw/siw/siw_cm.c
index a7cde98e73e8..8dc8cea2566c 100644
--- a/drivers/infiniband/sw/siw/siw_cm.c
+++ b/drivers/infiniband/sw/siw/siw_cm.c
@@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
* A private data buffer gets allocated if hdr->params.pd_len != 0.
*/
if (!cep->mpa.pdata) {
- cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
+ cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL);
if (!cep->mpa.pdata)
return -ENOMEM;
}
rcvd = ksock_recv(
s, cep->mpa.pdata + cep->mpa.bytes_rcvd - sizeof(struct mpa_rr),
- to_rcv + 4, MSG_DONTWAIT);
+ to_rcv, MSG_DONTWAIT);
if (rcvd < 0)
return rcvd;
--
2.23.0.rc0
The intention of this code is to make sure the
peer does not violates the MPA handshake rules.
The initiator MUST NOT send extra data after its
MPA request and before receiving the MPA reply.
I think this is true only for MPA v2. With MPA v1, the
initiator can begin sending immediately (before receiving
the MPA reply), because there is no actual negotiation at
the MPA layer.
With MPA v2, the negotiation exchange is required because
the type of the following message is predicated by the
"Enhanced mode" a|b|c|d flags present in the first 32 bits
of the private data buffer.
So, it seems to me that additional logic is needed here to
determine the MPA version, before sniffing additional octets
from the incoming stream?
Tom.
So, for the MPA request case, this code is needed
to check for protocol correctness.
You are right for the MPA reply case - if we are
_not_ in peer2peer mode, the peer can immediately
start sending data in RDMA mode after its MPA Reply.
So we shall add appropriate code to be more specific
For an error, we are (1) processing an MPA Request,
OR (2) processing an MPA Reply AND we are not expected
to send an initial READ/WRITE/Send as negotiated with
the peer (peer2peer mode MPA handshake).
Just removing this check would make siw more permissive,
but to a point where peer MPA protocol errors are
tolerated. I am not in favor of that level of
forgiveness.
If possible, please provide an appropriate patch
or (if it causes current issues with another peer
iWarp implementation) just run in MPA peer2peer mode,
where the current check is appropriate.
Otherwise, I would provide an appropriate fix by Monday
(I am still out of office this week).
Many thanks and best regards,
Bernard.