Re: [PATCH for-rc] siw: MPA Reply handler tries to read beyond MPA message

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, August 08/08/19, 2019 at 15:05:12 +0000, Bernard Metzler wrote:
> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: -----
> 
> >To: "Tom Talpey" <tom@xxxxxxxxxx>, <BMT@xxxxxxxxxxxxxx>
> >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
> >Date: 08/05/2019 07:26PM
> >Cc: "jgg@xxxxxxxx" <jgg@xxxxxxxx>, "linux-rdma@xxxxxxxxxxxxxxx"
> ><linux-rdma@xxxxxxxxxxxxxxx>, "Potnuri Bharat Teja"
> ><bharat@xxxxxxxxxxx>, "Nirranjan Kirubaharan" <nirranjan@xxxxxxxxxxx>
> >Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler tries
> >to read beyond MPA message
> >
> >On Friday, August 08/02/19, 2019 at 18:17:37 +0530, Tom Talpey wrote:
> >> On 8/2/2019 7:18 AM, Bernard Metzler wrote:
> >> > -----"Tom Talpey" <tom@xxxxxxxxxx> wrote: -----
> >> > 
> >> >> To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>, "Krishnamraju
> >Eraparaju"
> >> >> <krishna2@xxxxxxxxxxx>
> >> >> From: "Tom Talpey" <tom@xxxxxxxxxx>
> >> >> Date: 08/01/2019 08:54PM
> >> >> Cc: jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx,
> >bharat@xxxxxxxxxxx,
> >> >> nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx
> >> >> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler
> >tries
> >> >> to read beyond MPA message
> >> >>
> >> >> On 8/1/2019 6:56 AM, Bernard Metzler wrote:
> >> >>> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote:
> >-----
> >> >>>
> >> >>>> To: jgg@xxxxxxxx, bmt@xxxxxxxxxxxxxx
> >> >>>> From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
> >> >>>> Date: 07/31/2019 12:34PM
> >> >>>> Cc: linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx,
> >> >>>> nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx, "Krishnamraju
> >> >> Eraparaju"
> >> >>>> <krishna2@xxxxxxxxxxx>
> >> >>>> Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply handler
> >tries to
> >> >>>> read beyond MPA message
> >> >>>>
> >> >>>> while processing MPA Reply, SIW driver is trying to read extra
> >4
> >> >>>> bytes
> >> >>>> than what peer has advertised as private data length.
> >> >>>>
> >> >>>> If a FPDU data is received before even siw_recv_mpa_rr()
> >completed
> >> >>>> reading MPA reply, then ksock_recv() in siw_recv_mpa_rr()
> >could
> >> >> also
> >> >>>> read FPDU, if "size" is larger than advertised MPA reply
> >length.
> >> >>>>
> >> >>>> 501 static int siw_recv_mpa_rr(struct siw_cep *cep)
> >> >>>> 502 {
> >> >>>>            .............
> >> >>>> 572
> >> >>>> 573         if (rcvd > to_rcv)
> >> >>>> 574                 return -EPROTO;   <----- Failure here
> >> >>>>
> >> >>>> Looks like the intention here is to throw an ERROR if the
> >received
> >> >>>> data
> >> >>>> is more than the total private data length advertised by the
> >peer.
> >> >>>> But
> >> >>>> reading beyond MPA message causes siw_cm to generate
> >> >>>> RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv buffer
> >is
> >> >>>> already
> >> >>>> queued with FPDU messages.
> >> >>>>
> >> >>>> Hence, this function should only read upto private data
> >length.
> >> >>>>
> >> >>>> Signed-off-by: Krishnamraju Eraparaju <krishna2@xxxxxxxxxxx>
> >> >>>> ---
> >> >>>> drivers/infiniband/sw/siw/siw_cm.c | 4 ++--
> >> >>>> 1 file changed, 2 insertions(+), 2 deletions(-)
> >> >>>>
> >> >>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> >> >>>> b/drivers/infiniband/sw/siw/siw_cm.c
> >> >>>> index a7cde98e73e8..8dc8cea2566c 100644
> >> >>>> --- a/drivers/infiniband/sw/siw/siw_cm.c
> >> >>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c
> >> >>>> @@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct
> >siw_cep
> >> >> *cep)
> >> >>>> 	 * A private data buffer gets allocated if hdr->params.pd_len
> >!=
> >> >> 0.
> >> >>>> 	 */
> >> >>>> 	if (!cep->mpa.pdata) {
> >> >>>> -		cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
> >> >>>> +		cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL);
> >> >>>> 		if (!cep->mpa.pdata)
> >> >>>> 			return -ENOMEM;
> >> >>>> 	}
> >> >>>> 	rcvd = ksock_recv(
> >> >>>> 		s, cep->mpa.pdata + cep->mpa.bytes_rcvd - sizeof(struct
> >mpa_rr),
> >> >>>> -		to_rcv + 4, MSG_DONTWAIT);
> >> >>>> +		to_rcv, MSG_DONTWAIT);
> >> >>>>
> >> >>>> 	if (rcvd < 0)
> >> >>>> 		return rcvd;
> >> >>>> -- 
> >> >>>> 2.23.0.rc0
> >> >>>>
> >> >>>>
> >> >>>
> >> >>> The intention of this code is to make sure the
> >> >>> peer does not violates the MPA handshake rules.
> >> >>> The initiator MUST NOT send extra data after its
> >> >>> MPA request and before receiving the MPA reply.
> >> >>
> >> >> I think this is true only for MPA v2. With MPA v1, the
> >> >> initiator can begin sending immediately (before receiving
> >> >> the MPA reply), because there is no actual negotiation at
> >> >> the MPA layer.
> >> >>
> >> >> With MPA v2, the negotiation exchange is required because
> >> >> the type of the following message is predicated by the
> >> >> "Enhanced mode" a|b|c|d flags present in the first 32 bits
> >> >> of the private data buffer.
> >> >>
> >> >> So, it seems to me that additional logic is needed here to
> >> >> determine the MPA version, before sniffing additional octets
> >> >>from the incoming stream?
> >> >>
> >> >> Tom.
> >> >>
> >> > 
> >> > There still is the marker negotiation taking place.
> >> > RFC 5044 says in section 7.1.2:
> >> > 
> >> > "Note: Since the receiver's ability to deal with Markers is
> >> >   unknown until the Request and Reply Frames have been
> >> >   received, sending FPDUs before this occurs is not possible."
> >> > 
> >> > This section further discusses the responder's behavior,
> >> > where it MUST receive a first FPDU from the initiator
> >> > before sending its first FPDU:
> >> > 
> >> > "4.  MPA Responder mode implementations MUST receive and validate
> >at
> >> >         least one FPDU before sending any FPDUs or Markers."
> >> > 
> >> > So it appears with MPA version 1, the current siw
> >> > code is correct. The initiator is entering FPDU mode
> >> > first, and only after receiving the MPA reply frame.
> >> > Only after the initiator sent a first FPDU, the responder
> >> > can start using the connection in FPDU mode.
> >> > Because of this somehow broken connection establishment
> >> > procedure (only the initiator can start sending data), a
> >> > later MPA version makes this first FPDU exchange explicit
> >> > and selectable (zero length READ/WRITE/Send).
> >> 
> >> Yeah, I guess so. Because nobody ever actually implemented markers,
> >> I think that they may more or less passively ignore this. But
> >you're
> >> currect that it's invalid protocol behavior.
> >> 
> >> If your testing didn't uncover any issues with existing
> >implementations
> >> failing to connect with your strict checking, I'm ok with it.
> >Tom & Bernard,
> >Thanks for the insight on MPA negotiation.
> >
> >Could the below patch be considered as a proper fix?
> >
> >diff --git a/drivers/infiniband/sw/siw/siw_cm.c
> >b/drivers/infiniband/sw/siw/siw_cm.c
> >index 9ce8a1b925d2..0aec1b5212f9 100644
> >--- a/drivers/infiniband/sw/siw/siw_cm.c
> >+++ b/drivers/infiniband/sw/siw/siw_cm.c
> >@@ -503,6 +503,7 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
> >        struct socket *s = cep->sock;
> >        u16 pd_len;
> >        int rcvd, to_rcv;
> >+       int extra_data_check = 4; /* 4Bytes, for MPA rules violation
> >checking */
> >
> >        if (cep->mpa.bytes_rcvd < sizeof(struct mpa_rr)) {
> >                rcvd = ksock_recv(s, (char *)hdr +
> >cep->mpa.bytes_rcvd,
> >@@ -553,23 +554,37 @@ static int siw_recv_mpa_rr(struct siw_cep *cep)
> >                return -EPROTO;
> >        }
> >
> >+       /*
> >+        * Peer must not send any extra data other than MPA messages
> >until MPA
> >+        * negotiation is completed, an exception is MPA V2
> >client-server Mode,
> >+        * IE, in this mode the peer after sending MPA Reply can
> >immediately
> >+        * start sending data in RDMA mode.
> >+        */
> 
> This is unfortunately not true. The responder NEVER sends
> an FPDU without having seen an FPDU from the initiator.
> I just checked RFC 6581 again. The RTR (ready-to-receive)
> indication from the initiator is still needed, but now
> provided by the protocol and not the application: w/o
> 'enhanced MPA setup', the initiator sends the first
> FPDU as an application message. With 'enhanced MPA setup',
> the initiator protocol entity sends (w/o application
> interaction) a zero length READ/WRITE/Send as a first FPDU,
> as previously negotiated with the responder. Again: only
> after the responder has seen the first FPDU, it can
> start sending in FPDU mode.
If I understand your statements correctly: in MPA v2 clint-server mode,
the responder application should have some logic to wait(after ESTABLISH
event) until the first FPDU message is received from the initiator?

Thanks,
Krishna.
> 
> Sorry for the confusion. But the current
> siw code appears to be just correct.
> 
> Thanks
> Bernard
> 
> 
> 
> >+       if ((__mpa_rr_revision(cep->mpa.hdr.params.bits) ==
> >MPA_REVISION_2) &&
> >+               (cep->state == SIW_EPSTATE_AWAIT_MPAREP)) {
> >+               int mpa_p2p_mode = cep->mpa.v2_ctrl_req.ord &
> >+                               (MPA_V2_RDMA_WRITE_RTR |
> >MPA_V2_RDMA_READ_RTR);
> >+               if (!mpa_p2p_mode)
> >+                       extra_data_check = 0;
> >+       }
> >+
> >        /*
> >         * At this point, we must have hdr->params.pd_len != 0.
> >         * A private data buffer gets allocated if hdr->params.pd_len
> >!=
> >         * 0.
> >         */
> >        if (!cep->mpa.pdata) {
> >-               cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL);
> >+               cep->mpa.pdata = kmalloc(pd_len + extra_data_check,
> >GFP_KERNEL);
> >                if (!cep->mpa.pdata)
> >                        return -ENOMEM;
> >        }
> >        rcvd = ksock_recv(
> >                s, cep->mpa.pdata + cep->mpa.bytes_rcvd -
> >sizeof(struct
> >mpa_rr),
> >-               to_rcv + 4, MSG_DONTWAIT);
> >+               to_rcv + extra_data_check, MSG_DONTWAIT);
> >
> >        if (rcvd < 0)
> >                return rcvd;
> >
> >-       if (rcvd > to_rcv)
> >+       if (extra_data_check && (rcvd > to_rcv))
> >                return -EPROTO;
> >
> >        cep->mpa.bytes_rcvd += rcvd;
> >
> >-Krishna.
> >> 
> >> Tom.
> >> 
> >> 
> >> 
> >> > 
> >> > Bernard.
> >> > 
> >> >>
> >> >>> So, for the MPA request case, this code is needed
> >> >>> to check for protocol correctness.
> >> >>> You are right for the MPA reply case - if we are
> >> >>> _not_ in peer2peer mode, the peer can immediately
> >> >>> start sending data in RDMA mode after its MPA Reply.
> >> >>> So we shall add appropriate code to be more specific
> >> >>> For an error, we are (1) processing an MPA Request,
> >> >>> OR (2) processing an MPA Reply AND we are not expected
> >> >>> to send an initial READ/WRITE/Send as negotiated with
> >> >>> the peer (peer2peer mode MPA handshake).
> >> >>>
> >> >>> Just removing this check would make siw more permissive,
> >> >>> but to a point where peer MPA protocol errors are
> >> >>> tolerated. I am not in favor of that level of
> >> >>> forgiveness.
> >> >>>
> >> >>> If possible, please provide an appropriate patch
> >> >>> or (if it causes current issues with another peer
> >> >>> iWarp implementation) just run in MPA peer2peer mode,
> >> >>> where the current check is appropriate.
> >> >>> Otherwise, I would provide an appropriate fix by Monday
> >> >>> (I am still out of office this week).
> >> >>>
> >> >>>
> >> >>> Many thanks and best regards,
> >> >>> Bernard.
> >> >>>
> >> >>>
> >> >>>
> >> >>
> >> >>
> >> > 
> >> > 
> >> > 
> >
> >
> 



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux