On Monday, August 08/12/19, 2019 at 12:56:18 +0000, Bernard Metzler wrote: > -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- > > >To: "Bernard Metzler" <bmt@xxxxxxxxxxxxxx> > >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >Date: 08/12/2019 11:59AM > >Cc: "Tom Talpey" <tom@xxxxxxxxxx>, "jgg@xxxxxxxx" <jgg@xxxxxxxx>, > >"linux-rdma@xxxxxxxxxxxxxxx" <linux-rdma@xxxxxxxxxxxxxxx>, "Potnuri > >Bharat Teja" <bharat@xxxxxxxxxxx>, "Nirranjan Kirubaharan" > ><nirranjan@xxxxxxxxxxx> > >Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler tries > >to read beyond MPA message > > > >On Saturday, August 08/10/19, 2019 at 02:05:00 +0530, Tom Talpey > >wrote: > >> On 8/9/2019 9:52 AM, Bernard Metzler wrote: > >> > -----"Tom Talpey" <tom@xxxxxxxxxx> wrote: ----- > >> > > >> >> To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>, "Krishnamraju > >Eraparaju" > >> >> <krishna2@xxxxxxxxxxx> > >> >> From: "Tom Talpey" <tom@xxxxxxxxxx> > >> >> Date: 08/09/2019 02:27PM > >> >> Cc: "jgg@xxxxxxxx" <jgg@xxxxxxxx>, "linux-rdma@xxxxxxxxxxxxxxx" > >> >> <linux-rdma@xxxxxxxxxxxxxxx>, "Potnuri Bharat Teja" > >> >> <bharat@xxxxxxxxxxx>, "Nirranjan Kirubaharan" > ><nirranjan@xxxxxxxxxxx> > >> >> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler > >tries > >> >> to read beyond MPA message > >> >> > >> >> On 8/9/2019 6:29 AM, Bernard Metzler wrote: > >> >>> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: > >----- > >> >>> > >> >>>> To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx> > >> >>>> From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >>>> Date: 08/08/2019 06:47PM > >> >>>> Cc: "Tom Talpey" <tom@xxxxxxxxxx>, "jgg@xxxxxxxx" > ><jgg@xxxxxxxx>, > >> >>>> "linux-rdma@xxxxxxxxxxxxxxx" <linux-rdma@xxxxxxxxxxxxxxx>, > >> >> "Potnuri > >> >>>> Bharat Teja" <bharat@xxxxxxxxxxx>, "Nirranjan Kirubaharan" > >> >>>> <nirranjan@xxxxxxxxxxx> > >> >>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply handler > >> >> tries > >> >>>> to read beyond MPA message > >> >>>> > >> >>>> On Thursday, August 08/08/19, 2019 at 15:05:12 +0000, Bernard > >> >> Metzler > >> >>>> wrote: > >> >>>>> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: > >----- > >> >>>>> > >> >>>>>> To: "Tom Talpey" <tom@xxxxxxxxxx>, <BMT@xxxxxxxxxxxxxx> > >> >>>>>> From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >>>>>> Date: 08/05/2019 07:26PM > >> >>>>>> Cc: "jgg@xxxxxxxx" <jgg@xxxxxxxx>, > >"linux-rdma@xxxxxxxxxxxxxxx" > >> >>>>>> <linux-rdma@xxxxxxxxxxxxxxx>, "Potnuri Bharat Teja" > >> >>>>>> <bharat@xxxxxxxxxxx>, "Nirranjan Kirubaharan" > >> >>>> <nirranjan@xxxxxxxxxxx> > >> >>>>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply > >handler > >> >>>> tries > >> >>>>>> to read beyond MPA message > >> >>>>>> > >> >>>>>> On Friday, August 08/02/19, 2019 at 18:17:37 +0530, Tom > >Talpey > >> >>>> wrote: > >> >>>>>>> On 8/2/2019 7:18 AM, Bernard Metzler wrote: > >> >>>>>>>> -----"Tom Talpey" <tom@xxxxxxxxxx> wrote: ----- > >> >>>>>>>> > >> >>>>>>>>> To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>, "Krishnamraju > >> >>>>>> Eraparaju" > >> >>>>>>>>> <krishna2@xxxxxxxxxxx> > >> >>>>>>>>> From: "Tom Talpey" <tom@xxxxxxxxxx> > >> >>>>>>>>> Date: 08/01/2019 08:54PM > >> >>>>>>>>> Cc: jgg@xxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, > >> >>>>>> bharat@xxxxxxxxxxx, > >> >>>>>>>>> nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx > >> >>>>>>>>> Subject: [EXTERNAL] Re: [PATCH for-rc] siw: MPA Reply > >handler > >> >>>>>> tries > >> >>>>>>>>> to read beyond MPA message > >> >>>>>>>>> > >> >>>>>>>>> On 8/1/2019 6:56 AM, Bernard Metzler wrote: > >> >>>>>>>>>> -----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >wrote: > >> >>>>>> ----- > >> >>>>>>>>>> > >> >>>>>>>>>>> To: jgg@xxxxxxxx, bmt@xxxxxxxxxxxxxx > >> >>>>>>>>>>> From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> > >> >>>>>>>>>>> Date: 07/31/2019 12:34PM > >> >>>>>>>>>>> Cc: linux-rdma@xxxxxxxxxxxxxxx, bharat@xxxxxxxxxxx, > >> >>>>>>>>>>> nirranjan@xxxxxxxxxxx, krishn2@xxxxxxxxxxx, > >"Krishnamraju > >> >>>>>>>>> Eraparaju" > >> >>>>>>>>>>> <krishna2@xxxxxxxxxxx> > >> >>>>>>>>>>> Subject: [EXTERNAL] [PATCH for-rc] siw: MPA Reply > >handler > >> >>>>>> tries to > >> >>>>>>>>>>> read beyond MPA message > >> >>>>>>>>>>> > >> >>>>>>>>>>> while processing MPA Reply, SIW driver is trying to > >read > >> >>>> extra > >> >>>>>> 4 > >> >>>>>>>>>>> bytes > >> >>>>>>>>>>> than what peer has advertised as private data length. > >> >>>>>>>>>>> > >> >>>>>>>>>>> If a FPDU data is received before even > >siw_recv_mpa_rr() > >> >>>>>> completed > >> >>>>>>>>>>> reading MPA reply, then ksock_recv() in > >siw_recv_mpa_rr() > >> >>>>>> could > >> >>>>>>>>> also > >> >>>>>>>>>>> read FPDU, if "size" is larger than advertised MPA > >reply > >> >>>>>> length. > >> >>>>>>>>>>> > >> >>>>>>>>>>> 501 static int siw_recv_mpa_rr(struct siw_cep *cep) > >> >>>>>>>>>>> 502 { > >> >>>>>>>>>>> ............. > >> >>>>>>>>>>> 572 > >> >>>>>>>>>>> 573 if (rcvd > to_rcv) > >> >>>>>>>>>>> 574 return -EPROTO; <----- Failure > >here > >> >>>>>>>>>>> > >> >>>>>>>>>>> Looks like the intention here is to throw an ERROR if > >the > >> >>>>>> received > >> >>>>>>>>>>> data > >> >>>>>>>>>>> is more than the total private data length advertised > >by > >> >>>> the > >> >>>>>> peer. > >> >>>>>>>>>>> But > >> >>>>>>>>>>> reading beyond MPA message causes siw_cm to generate > >> >>>>>>>>>>> RDMA_CM_EVENT_CONNECT_ERROR event when TCP socket recv > >> >>>> buffer > >> >>>>>> is > >> >>>>>>>>>>> already > >> >>>>>>>>>>> queued with FPDU messages. > >> >>>>>>>>>>> > >> >>>>>>>>>>> Hence, this function should only read upto private data > >> >>>>>> length. > >> >>>>>>>>>>> > >> >>>>>>>>>>> Signed-off-by: Krishnamraju Eraparaju > >> >>>> <krishna2@xxxxxxxxxxx> > >> >>>>>>>>>>> --- > >> >>>>>>>>>>> drivers/infiniband/sw/siw/siw_cm.c | 4 ++-- > >> >>>>>>>>>>> 1 file changed, 2 insertions(+), 2 deletions(-) > >> >>>>>>>>>>> > >> >>>>>>>>>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>>>>>>> b/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>>>>>>> index a7cde98e73e8..8dc8cea2566c 100644 > >> >>>>>>>>>>> --- a/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>>>>>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>>>>>>> @@ -559,13 +559,13 @@ static int siw_recv_mpa_rr(struct > >> >>>>>> siw_cep > >> >>>>>>>>> *cep) > >> >>>>>>>>>>> * A private data buffer gets allocated if > >> >>>> hdr->params.pd_len > >> >>>>>> != > >> >>>>>>>>> 0. > >> >>>>>>>>>>> */ > >> >>>>>>>>>>> if (!cep->mpa.pdata) { > >> >>>>>>>>>>> - cep->mpa.pdata = kmalloc(pd_len + 4, GFP_KERNEL); > >> >>>>>>>>>>> + cep->mpa.pdata = kmalloc(pd_len, GFP_KERNEL); > >> >>>>>>>>>>> if (!cep->mpa.pdata) > >> >>>>>>>>>>> return -ENOMEM; > >> >>>>>>>>>>> } > >> >>>>>>>>>>> rcvd = ksock_recv( > >> >>>>>>>>>>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd - > >sizeof(struct > >> >>>>>> mpa_rr), > >> >>>>>>>>>>> - to_rcv + 4, MSG_DONTWAIT); > >> >>>>>>>>>>> + to_rcv, MSG_DONTWAIT); > >> >>>>>>>>>>> > >> >>>>>>>>>>> if (rcvd < 0) > >> >>>>>>>>>>> return rcvd; > >> >>>>>>>>>>> -- > >> >>>>>>>>>>> 2.23.0.rc0 > >> >>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> The intention of this code is to make sure the > >> >>>>>>>>>> peer does not violates the MPA handshake rules. > >> >>>>>>>>>> The initiator MUST NOT send extra data after its > >> >>>>>>>>>> MPA request and before receiving the MPA reply. > >> >>>>>>>>> > >> >>>>>>>>> I think this is true only for MPA v2. With MPA v1, the > >> >>>>>>>>> initiator can begin sending immediately (before receiving > >> >>>>>>>>> the MPA reply), because there is no actual negotiation at > >> >>>>>>>>> the MPA layer. > >> >>>>>>>>> > >> >>>>>>>>> With MPA v2, the negotiation exchange is required because > >> >>>>>>>>> the type of the following message is predicated by the > >> >>>>>>>>> "Enhanced mode" a|b|c|d flags present in the first 32 > >bits > >> >>>>>>>>> of the private data buffer. > >> >>>>>>>>> > >> >>>>>>>>> So, it seems to me that additional logic is needed here > >to > >> >>>>>>>>> determine the MPA version, before sniffing additional > >octets > >> >>>>>>>> >from the incoming stream? > >> >>>>>>>>> > >> >>>>>>>>> Tom. > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>>> There still is the marker negotiation taking place. > >> >>>>>>>> RFC 5044 says in section 7.1.2: > >> >>>>>>>> > >> >>>>>>>> "Note: Since the receiver's ability to deal with Markers > >is > >> >>>>>>>> unknown until the Request and Reply Frames have been > >> >>>>>>>> received, sending FPDUs before this occurs is not > >> >> possible." > >> >>>>>>>> > >> >>>>>>>> This section further discusses the responder's behavior, > >> >>>>>>>> where it MUST receive a first FPDU from the initiator > >> >>>>>>>> before sending its first FPDU: > >> >>>>>>>> > >> >>>>>>>> "4. MPA Responder mode implementations MUST receive and > >> >>>> validate > >> >>>>>> at > >> >>>>>>>> least one FPDU before sending any FPDUs or > >Markers." > >> >>>>>>>> > >> >>>>>>>> So it appears with MPA version 1, the current siw > >> >>>>>>>> code is correct. The initiator is entering FPDU mode > >> >>>>>>>> first, and only after receiving the MPA reply frame. > >> >>>>>>>> Only after the initiator sent a first FPDU, the responder > >> >>>>>>>> can start using the connection in FPDU mode. > >> >>>>>>>> Because of this somehow broken connection establishment > >> >>>>>>>> procedure (only the initiator can start sending data), a > >> >>>>>>>> later MPA version makes this first FPDU exchange explicit > >> >>>>>>>> and selectable (zero length READ/WRITE/Send). > >> >>>>>>> > >> >>>>>>> Yeah, I guess so. Because nobody ever actually implemented > >> >>>> markers, > >> >>>>>>> I think that they may more or less passively ignore this. > >But > >> >>>>>> you're > >> >>>>>>> currect that it's invalid protocol behavior. > >> >>>>>>> > >> >>>>>>> If your testing didn't uncover any issues with existing > >> >>>>>> implementations > >> >>>>>>> failing to connect with your strict checking, I'm ok with > >it. > >> >>>>>> Tom & Bernard, > >> >>>>>> Thanks for the insight on MPA negotiation. > >> >>>>>> > >> >>>>>> Could the below patch be considered as a proper fix? > >> >>>>>> > >> >>>>>> diff --git a/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>> b/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>> index 9ce8a1b925d2..0aec1b5212f9 100644 > >> >>>>>> --- a/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>> +++ b/drivers/infiniband/sw/siw/siw_cm.c > >> >>>>>> @@ -503,6 +503,7 @@ static int siw_recv_mpa_rr(struct > >siw_cep > >> >>>> *cep) > >> >>>>>> struct socket *s = cep->sock; > >> >>>>>> u16 pd_len; > >> >>>>>> int rcvd, to_rcv; > >> >>>>>> + int extra_data_check = 4; /* 4Bytes, for MPA rules > >> >>>> violation > >> >>>>>> checking */ > >> >>>>>> > >> >>>>>> if (cep->mpa.bytes_rcvd < sizeof(struct mpa_rr)) { > >> >>>>>> rcvd = ksock_recv(s, (char *)hdr + > >> >>>>>> cep->mpa.bytes_rcvd, > >> >>>>>> @@ -553,23 +554,37 @@ static int siw_recv_mpa_rr(struct > >siw_cep > >> >>>> *cep) > >> >>>>>> return -EPROTO; > >> >>>>>> } > >> >>>>>> > >> >>>>>> + /* > >> >>>>>> + * Peer must not send any extra data other than MPA > >> >>>> messages > >> >>>>>> until MPA > >> >>>>>> + * negotiation is completed, an exception is MPA V2 > >> >>>>>> client-server Mode, > >> >>>>>> + * IE, in this mode the peer after sending MPA Reply > >can > >> >>>>>> immediately > >> >>>>>> + * start sending data in RDMA mode. > >> >>>>>> + */ > >> >>>>> > >> >>>>> This is unfortunately not true. The responder NEVER sends > >> >>>>> an FPDU without having seen an FPDU from the initiator. > >> >>>>> I just checked RFC 6581 again. The RTR (ready-to-receive) > >> >>>>> indication from the initiator is still needed, but now > >> >>>>> provided by the protocol and not the application: w/o > >> >>>>> 'enhanced MPA setup', the initiator sends the first > >> >>>>> FPDU as an application message. With 'enhanced MPA setup', > >> >>>>> the initiator protocol entity sends (w/o application > >> >>>>> interaction) a zero length READ/WRITE/Send as a first FPDU, > >> >>>>> as previously negotiated with the responder. Again: only > >> >>>>> after the responder has seen the first FPDU, it can > >> >>>>> start sending in FPDU mode. > >> >>>> If I understand your statements correctly: in MPA v2 > >clint-server > >> >>>> mode, > >> >>>> the responder application should have some logic to wait(after > >> >>>> ESTABLISH > >> >>>> event) until the first FPDU message is received from the > >> >> initiator? > >> >>>> > >> >>>> Thanks, > >> >>>> Krishna. > >> >>> > >> >>> The responder may either delay the established event > >> >>> until it sees the first peer FPDU, or it delays sending > >> >>> the first (now already posted) FPDU, until it sees the > >> >>> first initiator FPDU. This, since (theoretically), the > >> >>> responder does not know yet the result of CRC and > >> >>> Marker negotiation. siw delays the established event, > >> >>> which seems the more clear path to me. > >> >>> > >> >>> This MPA handshake is cumbersome. siw is permissive > >> >>> for violations of these rules to the extend, that the > >> >>> responder shall not send back to back the first FPDU > >> >>> with the MPA reply. If a (Chelsio ?) iWarp adapter > >> >>> would send the first responder FPDU only after it > >> >>> got TCP acked its MPA reply, the chances are high the > >> >>> (siw) iWarp initiator has already transitioned to RTS > >> >>> and it would accept that FPU, even if not having sent > >> >>> it's first FPDU. > >> >>> > >> >>> siw sticks to this rule to that extend, since all receive > >> >>> processing is triggered by socket callbacks (from > >> >>> sk->sk_data_ready()). As long as the QP is not > >> >>> in RTS, all TCP bytestream processing is done by the > >> >>> MPA protocol (code in siw_cm.c). If the QP reaches > >> >>> RTS, RX processing is done bt the full iWarp > >> >>> protocol (code in siw_qp_rx.c for RX). > >> >>> Now, if the higher layer (application) protocol has > >> >>> a semantic where the responder sends the first message, > >> >>> we could end up in an infinite wait for that packet > >> >>> at initiator application side. > >> >>> This, since the complete first FPDU was already > >> >>> received at the TCP socket, while the QP was not in > >> >>> RTS. It will not trigger any additional sk->sk_data_ready() > >> >>> and we are stuck with a FPDU in the socket rx buffer. > >> >>> > >> >>> I implemented that 4 bytes extra data testing only to > >> >>> keep the siw protocol state machine as simple as > >> >>> possible (it is already unfortunately complex, if > >> >>> you count the lines in siw_cm.c), while following > >> >>> the protocol rules. > >> >>> > >> >>> I suggest to correctly implement peer2peer mode and > >> >>> delay the established event at the responder side until > >> >>> it got the negotiated first zero length FPDU. Out > >> >>> of the possible options, siw supports both zero length > >> >>> WRITE and READ, but no SEND, since this would consume > >> >>> an extra receive WQE. > >> >>> > >> >>> As a last resort, I might consider extending the > >> >>> siw state machine to handle those - non-complaint - > >> >>> cases gracefully. But, that would result in different > >> >>> code than just avoiding checking for peer protocol > >> >>> violation. > >> >> > >> >> Bernard, everything you describe in siw seems perfectly valid > >> >> to me, and in fact that's why the MPA enhanced connection mode > >> >> was designed the way it is. I disagree that it's "cumbersome", > >> >> but that's a nit. > >> >> > >> >> The issue is that the responder reaches RTS at a different time > >> >> than the initiator reaches RTR. The original iWARP connection > >> >> model did not make any requirement, and races were possible. > >> >> MPAv2 fixes those. > >> >> > >> > Tom, thanks, exactly. MPAv2 fixes it. And an implementation > > > >Thanks Bernard & Tom, > > > >Could you also please consider making MPA ehanced connection > >mode(with RTR handshake) as default for SIW, as MPA V2 peer- > >to-peer mode seems to be more promising that MPA V2 > >client-server Mode, wrt race conditions. > > > >Currently SIW has MPA V2 client-server Mode as default. > > > > > >In siw_main.c: > > const bool peer_to_peer = 1; > > > > > >Between, as per my observations, the chances of hitting this 'connect > >error' issue is higher with: SIW(initator)<--->(responder)SIW > >than: SIW(initator) <---> (responder)hard-iwarp > > Good to know! Yes, siw is permissive at responder side, it's > applications responsibility to delay the first message, if p2p > mode is not selected. > What application are you using here (which let's > the responder immediately start using the SQ)? Just to let > me re-create the case w/o effort. > I might produce a fix for siw which can handle that > case for the non-p2p-mode case. I originally noticed this issue with a bidirectional bandwidth measurement tool(that we use internally). Later I wrote a small program to quickly recreate this issue. please follow the below steps: - git clone https://github.com/krishna-e/rdma-example-programs - cd rdma-example-programs/IBV_WR_RDMA_WRITE_for_SIW_plus_4_issue/ - server side: # cc -g server.c -libverbs -lrdmacm -o server # ./server <any_integer_data> - client side: # echo -n 'module siw +p' > # /sys/kernel/debug/dynamic_debug/control (just to slow down the client processing by enabling traces) # cc -g client.c -libverbs -lrdmacm -o client # ./client <ip_address> output: Unexpected event : RDMA_CM_EVENT_CONNECT_ERROR > > Setting p2p mode as default is another option. I am not > immediately happy with that suggestion, since it adds > another wire transfer as default. The future shall see > those parameters settable. These were module parameters, > but we abandoned that. We should probably come up with > an extension to the RDMA netlink interface, which > makes those protocol specific modes and flavors settable > per device...? > > Thanks > Bernard. > > > > > >> > of that fix MUST adhere to the defined rules. If one > >> > negotiates an extra handshake to synchronize, it > >> > MUST adhere to that handshake rules. There is no point in > >> > negotiating an extra handshake, and right away ignoring it. > >> > If the responder wants a zero length WRITE, it MUST use > >> > its reception to synchronize its transition to RTS. > >> > > >> > Shall we extend the siw state machine to support silly > >> > peer behavior? As said, if I read there is no way of > >> > >> Of course the right answer to this is "no", but I'd reserve > >> drawing that conclusion until I am certain it won't cause > >> trouble. If existing iWARP implementations don't enforce > >> this, and SIW becomes the odd duck failing connections, then > >> I'm afraid I would have to say "yes" instead. > >> > >> > making the peer implementation RFC compliant (it would > >> > be unexpected), I'll look into the applicability of > >> > Postel's Rule. But let's make those things explicit. > >> > >> Explicit is best, agreed. > >> > >> Tom. > >> > >> > > >> >> SIW is certainly within its rights to prevent peers from > >> >> sending prior to responder RTR. I'd suggest that if possible, > >> >> you try to detect the deadlock rather than flatly rejecting any > >> >> incoming bytes. The Internet Principle (be liberal in what you > >> >> accept...), after all. > >> >> > >> >> This same dance happens in IB/RoCE, btw. Just via different > >> >> messages. > >> >> > >> > Interesting! > >> > > >> > Thanks > >> > Bernard. > >> > > >> >> Tom. > >> >> > >> >> > >> >>> > >> >>> Thanks, > >> >>> Bernard. > >> >>> > >> >>> > >> >>>>> > >> >>>>> Sorry for the confusion. But the current > >> >>>>> siw code appears to be just correct. > >> >>>>> > >> >>>>> Thanks > >> >>>>> Bernard > >> >>>>> > >> >>>>> > >> >>>>> > >> >>>>>> + if ((__mpa_rr_revision(cep->mpa.hdr.params.bits) == > >> >>>>>> MPA_REVISION_2) && > >> >>>>>> + (cep->state == SIW_EPSTATE_AWAIT_MPAREP)) { > >> >>>>>> + int mpa_p2p_mode = cep->mpa.v2_ctrl_req.ord > >& > >> >>>>>> + (MPA_V2_RDMA_WRITE_RTR | > >> >>>>>> MPA_V2_RDMA_READ_RTR); > >> >>>>>> + if (!mpa_p2p_mode) > >> >>>>>> + extra_data_check = 0; > >> >>>>>> + } > >> >>>>>> + > >> >>>>>> /* > >> >>>>>> * At this point, we must have hdr->params.pd_len > >!= 0. > >> >>>>>> * A private data buffer gets allocated if > >> >>>> hdr->params.pd_len > >> >>>>>> != > >> >>>>>> * 0. > >> >>>>>> */ > >> >>>>>> if (!cep->mpa.pdata) { > >> >>>>>> - cep->mpa.pdata = kmalloc(pd_len + 4, > >> >> GFP_KERNEL); > >> >>>>>> + cep->mpa.pdata = kmalloc(pd_len + > >> >>>> extra_data_check, > >> >>>>>> GFP_KERNEL); > >> >>>>>> if (!cep->mpa.pdata) > >> >>>>>> return -ENOMEM; > >> >>>>>> } > >> >>>>>> rcvd = ksock_recv( > >> >>>>>> s, cep->mpa.pdata + cep->mpa.bytes_rcvd - > >> >>>>>> sizeof(struct > >> >>>>>> mpa_rr), > >> >>>>>> - to_rcv + 4, MSG_DONTWAIT); > >> >>>>>> + to_rcv + extra_data_check, MSG_DONTWAIT); > >> >>>>>> > >> >>>>>> if (rcvd < 0) > >> >>>>>> return rcvd; > >> >>>>>> > >> >>>>>> - if (rcvd > to_rcv) > >> >>>>>> + if (extra_data_check && (rcvd > to_rcv)) > >> >>>>>> return -EPROTO; > >> >>>>>> > >> >>>>>> cep->mpa.bytes_rcvd += rcvd; > >> >>>>>> > >> >>>>>> -Krishna. > >> >>>>>>> > >> >>>>>>> Tom. > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>>> > >> >>>>>>>> Bernard. > >> >>>>>>>> > >> >>>>>>>>> > >> >>>>>>>>>> So, for the MPA request case, this code is needed > >> >>>>>>>>>> to check for protocol correctness. > >> >>>>>>>>>> You are right for the MPA reply case - if we are > >> >>>>>>>>>> _not_ in peer2peer mode, the peer can immediately > >> >>>>>>>>>> start sending data in RDMA mode after its MPA Reply. > >> >>>>>>>>>> So we shall add appropriate code to be more specific > >> >>>>>>>>>> For an error, we are (1) processing an MPA Request, > >> >>>>>>>>>> OR (2) processing an MPA Reply AND we are not expected > >> >>>>>>>>>> to send an initial READ/WRITE/Send as negotiated with > >> >>>>>>>>>> the peer (peer2peer mode MPA handshake). > >> >>>>>>>>>> > >> >>>>>>>>>> Just removing this check would make siw more permissive, > >> >>>>>>>>>> but to a point where peer MPA protocol errors are > >> >>>>>>>>>> tolerated. I am not in favor of that level of > >> >>>>>>>>>> forgiveness. > >> >>>>>>>>>> > >> >>>>>>>>>> If possible, please provide an appropriate patch > >> >>>>>>>>>> or (if it causes current issues with another peer > >> >>>>>>>>>> iWarp implementation) just run in MPA peer2peer mode, > >> >>>>>>>>>> where the current check is appropriate. > >> >>>>>>>>>> Otherwise, I would provide an appropriate fix by Monday > >> >>>>>>>>>> (I am still out of office this week). > >> >>>>>>>>>> > >> >>>>>>>>>> > >> >>>>>>>>>> Many thanks and best regards, > >> >>>>>>>>>> Bernard. > >> >>>>>>>>>> > >> >> > >> >> > >> > > >> > > >> > > > > > >