On Thursday, August 08/08/19, 2019 at 15:05:12 +0000, Bernard Metzler wrote: > -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- > > >To: "Tom Talpey" <tom@xxxxxxxxxx>, <BMT@xxxxxxxxxxxxxx> > >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >Date: 08/05/2019 07:26PM > >Cc: "jgg@xxxxxxxx" <jgg@xxxxxxxx>, "linux-rdma@xxxxxxxxxxxxxxx" > ><linux-rdma@xxxxxxxxxxxxxxx>, "Potnuri Bharat Teja" > ><bharat@xxxxxxxxxxx>, "Nirranjan Kirubaharan" <nirranjan@xxxxxxxxxxx> > >Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler tries > >to read beyond MPA message > > > >On Friday, August 08/02/19, 2019 at 18:17:37 +0530, Tom Talpey wrote: > >> On 8/2/2019 7:18 AM, Bernard Metzler wrote: > >> > -----"Tom Talpey" <tom@xxxxxxxxxx> wrote: ----- > >> > > >> >> To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>, "Krishnamraju > >Eraparaju" > >> >> <krishna2@xxxxxxxxxxx> > >> >> From: "Tom Talpey" <tom@xxxxxxxxxx> > >> >> Date: 08/01/2019 08:54PM > >> >> Cc: jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, > >bharat@xxxxxxxxxxx, > >> >> nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx > >> >> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler > >tries > >> >> to read beyond MPA message > >> >> > >> >> On 8/1/2019 6:56 AM, Bernard Metzler wrote: > >> >>> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: > >----- > >> >>> > >> >>>> To: jgg@xxxxxxxx, bmt@xxxxxxxxxxxxxx > >> >>>> From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >>>> Date: 07/31/2019 12:34PM > >> >>>> Cc: linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, > >> >>>> nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx, "Krishnamraju > >> >> Eraparaju" > >> >>>> <krishna2@xxxxxxxxxxx> > >> >>>> Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply handler > >tries to > >> >>>> read beyond MPA message > >> >>>> > >> >>>> while processing MPA Reply, SIW driver is trying to read extra > >4 > >> >>>> bytes > >> >>>> than what peer has advertised as private data length. > >> >>>> > >> >>>> If a FPDU data is received before even siw_recv_mpa_rr() > >completed > >> >>>> reading MPA reply, then ksock_recv() in siw_recv_mpa_rr() > >could > >> >> also > >> >>>> read FPDU, if "size" is larger than advertised MPA reply > >length. > >> >>>> > >> >>>> 501 static int siw_recv_mpa_rr(struct siw_cep *cep) > >> >>>> 502 { > >> >>>> ............. > >> >>>> 572 > >> >>>> 573 if (rcvd > to_rcv) > >> >>>> 574 return -EPROTO; <----- Failure here > >> >>>> > >> >>>> Looks like the intention here is to throw an ERROR if the > >received > >> >>>> data > >> >>>> is more than the total private data length advertised by the > >peer. > >> >>>> But > >> >>>> reading beyond MPA message causes siw_cm to generate > >> >>>> RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv buffer > >is > >> >>>> already > >> >>>> queued with FPDU messages. > >> >>>> > >> >>>> Hence, this function should only read upto private data > >length. > >> >>>> > >> >>>> Signed-off-by: Krishnamraju Eraparaju <krishna2@xxxxxxxxxxx> > >> >>>> --- > >> >>>> drivers/infiniband/sw/siw/siw_cm.c | 4 ++-- > >> >>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >> >>>> > >> >>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c > >> >>>> b/drivers/infiniband/sw/siw/siw_cm.c > >> >>>> index a7cde98e73e8..8dc8cea2566c 100644 > >> >>>> --- a/drivers/infiniband/sw/siw/siw_cm.c > >> >>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c > >> >>>> @@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct > >siw_cep > >> >> *cep) > >> >>>> * A private data buffer gets allocated if hdr->params.pd_len > >!= > >> >> 0. > >> >>>> */ > >> >>>> if (!cep->mpa.pdata) { > >> >>>> - cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL); > >> >>>> + cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL); > >> >>>> if (!cep->mpa.pdata) > >> >>>> return -ENOMEM; > >> >>>> } > >> >>>> rcvd = ksock_recv( > >> >>>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd - sizeof(struct > >mpa_rr), > >> >>>> - to_rcv + 4, MSG_DONTWAIT); > >> >>>> + to_rcv, MSG_DONTWAIT); > >> >>>> > >> >>>> if (rcvd < 0) > >> >>>> return rcvd; > >> >>>> -- > >> >>>> 2.23.0.rc0 > >> >>>> > >> >>>> > >> >>> > >> >>> The intention of this code is to make sure the > >> >>> peer does not violates the MPA handshake rules. > >> >>> The initiator MUST NOT send extra data after its > >> >>> MPA request and before receiving the MPA reply. > >> >> > >> >> I think this is true only for MPA v2. With MPA v1, the > >> >> initiator can begin sending immediately (before receiving > >> >> the MPA reply), because there is no actual negotiation at > >> >> the MPA layer. > >> >> > >> >> With MPA v2, the negotiation exchange is required because > >> >> the type of the following message is predicated by the > >> >> "Enhanced mode" a|b|c|d flags present in the first 32 bits > >> >> of the private data buffer. > >> >> > >> >> So, it seems to me that additional logic is needed here to > >> >> determine the MPA version, before sniffing additional octets > >> >>from the incoming stream? > >> >> > >> >> Tom. > >> >> > >> > > >> > There still is the marker negotiation taking place. > >> > RFC 5044 says in section 7.1.2: > >> > > >> > "Note: Since the receiver's ability to deal with Markers is > >> > unknown until the Request and Reply Frames have been > >> > received, sending FPDUs before this occurs is not possible." > >> > > >> > This section further discusses the responder's behavior, > >> > where it MUST receive a first FPDU from the initiator > >> > before sending its first FPDU: > >> > > >> > "4. MPA Responder mode implementations MUST receive and validate > >at > >> > least one FPDU before sending any FPDUs or Markers." > >> > > >> > So it appears with MPA version 1, the current siw > >> > code is correct. The initiator is entering FPDU mode > >> > first, and only after receiving the MPA reply frame. > >> > Only after the initiator sent a first FPDU, the responder > >> > can start using the connection in FPDU mode. > >> > Because of this somehow broken connection establishment > >> > procedure (only the initiator can start sending data), a > >> > later MPA version makes this first FPDU exchange explicit > >> > and selectable (zero length READ/WRITE/Send). > >> > >> Yeah, I guess so. Because nobody ever actually implemented markers, > >> I think that they may more or less passively ignore this. But > >you're > >> currect that it's invalid protocol behavior. > >> > >> If your testing didn't uncover any issues with existing > >implementations > >> failing to connect with your strict checking, I'm ok with it. > >Tom & Bernard, > >Thanks for the insight on MPA negotiation. > > > >Could the below patch be considered as a proper fix? > > > >diff --git a/drivers/infiniband/sw/siw/siw_cm.c > >b/drivers/infiniband/sw/siw/siw_cm.c > >index 9ce8a1b925d2..0aec1b5212f9 100644 > >--- a/drivers/infiniband/sw/siw/siw_cm.c > >+++ b/drivers/infiniband/sw/siw/siw_cm.c > >@@ -503,6 +503,7 @@ static int siw_recv_mpa_rr(struct siw_cep *cep) > > struct socket *s = cep->sock; > > u16 pd_len; > > int rcvd, to_rcv; > >+ int extra_data_check = 4; /* 4Bytes, for MPA rules violation > >checking */ > > > > if (cep->mpa.bytes_rcvd < sizeof(struct mpa_rr)) { > > rcvd = ksock_recv(s, (char *)hdr + > >cep->mpa.bytes_rcvd, > >@@ -553,23 +554,37 @@ static int siw_recv_mpa_rr(struct siw_cep *cep) > > return -EPROTO; > > } > > > >+ /* > >+ * Peer must not send any extra data other than MPA messages > >until MPA > >+ * negotiation is completed, an exception is MPA V2 > >client-server Mode, > >+ * IE, in this mode the peer after sending MPA Reply can > >immediately > >+ * start sending data in RDMA mode. > >+ */ > > This is unfortunately not true. The responder NEVER sends > an FPDU without having seen an FPDU from the initiator. > I just checked RFC 6581 again. The RTR (ready-to-receive) > indication from the initiator is still needed, but now > provided by the protocol and not the application: w/o > 'enhanced MPA setup', the initiator sends the first > FPDU as an application message. With 'enhanced MPA setup', > the initiator protocol entity sends (w/o application > interaction) a zero length READ/WRITE/Send as a first FPDU, > as previously negotiated with the responder. Again: only > after the responder has seen the first FPDU, it can > start sending in FPDU mode. If I understand your statements correctly: in MPA v2 clint-server mode, the responder application should have some logic to wait(after ESTABLISH event) until the first FPDU message is received from the initiator? Thanks, Krishna. > > Sorry for the confusion. But the current > siw code appears to be just correct. > > Thanks > Bernard > > > > >+ if ((__mpa_rr_revision(cep->mpa.hdr.params.bits) == > >MPA_REVISION_2) && > >+ (cep->state == SIW_EPSTATE_AWAIT_MPAREP)) { > >+ int mpa_p2p_mode = cep->mpa.v2_ctrl_req.ord & > >+ (MPA_V2_RDMA_WRITE_RTR | > >MPA_V2_RDMA_READ_RTR); > >+ if (!mpa_p2p_mode) > >+ extra_data_check = 0; > >+ } > >+ > > /* > > * At this point, we must have hdr->params.pd_len != 0. > > * A private data buffer gets allocated if hdr->params.pd_len > >!= > > * 0. > > */ > > if (!cep->mpa.pdata) { > >- cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL); > >+ cep->mpa.pdata = kmalloc(pd_len + extra_data_check, > >GFP_KERNEL); > > if (!cep->mpa.pdata) > > return -ENOMEM; > > } > > rcvd = ksock_recv( > > s, cep->mpa.pdata + cep->mpa.bytes_rcvd - > >sizeof(struct > >mpa_rr), > >- to_rcv + 4, MSG_DONTWAIT); > >+ to_rcv + extra_data_check, MSG_DONTWAIT); > > > > if (rcvd < 0) > > return rcvd; > > > >- if (rcvd > to_rcv) > >+ if (extra_data_check && (rcvd > to_rcv)) > > return -EPROTO; > > > > cep->mpa.bytes_rcvd += rcvd; > > > >-Krishna. > >> > >> Tom. > >> > >> > >> > >> > > >> > Bernard. > >> > > >> >> > >> >>> So, for the MPA request case, this code is needed > >> >>> to check for protocol correctness. > >> >>> You are right for the MPA reply case - if we are > >> >>> _not_ in peer2peer mode, the peer can immediately > >> >>> start sending data in RDMA mode after its MPA Reply. > >> >>> So we shall add appropriate code to be more specific > >> >>> For an error, we are (1) processing an MPA Request, > >> >>> OR (2) processing an MPA Reply AND we are not expected > >> >>> to send an initial READ/WRITE/Send as negotiated with > >> >>> the peer (peer2peer mode MPA handshake). > >> >>> > >> >>> Just removing this check would make siw more permissive, > >> >>> but to a point where peer MPA protocol errors are > >> >>> tolerated. I am not in favor of that level of > >> >>> forgiveness. > >> >>> > >> >>> If possible, please provide an appropriate patch > >> >>> or (if it causes current issues with another peer > >> >>> iWarp implementation) just run in MPA peer2peer mode, > >> >>> where the current check is appropriate. > >> >>> Otherwise, I would provide an appropriate fix by Monday > >> >>> (I am still out of office this week). > >> >>> > >> >>> > >> >>> Many thanks and best regards, > >> >>> Bernard. > >> >>> > >> >>> > >> >>> > >> >> > >> >> > >> > > >> > > >> > > > > > >