On 2021/8/16 15:47, Zhu Yanjun wrote: > On Wed, Aug 11, 2021 at 12:33 AM yangx.jy@xxxxxxxxxxx > <yangx.jy@xxxxxxxxxxx> wrote: >> On 2021/8/10 11:40, Zhu Yanjun wrote: >>> On Mon, Aug 9, 2021 at 10:43 PM Xiao Yang<yangx.jy@xxxxxxxxxxx> wrote: >>>> Resid indicates the residual length of transmitted bytes but current >>>> rxe sets it to zero for inline data at the beginning. In this case, >>>> request will transmit zero byte to responder wrongly. >>> What are the symptoms if resid is set to zero? >> Hi Yanjun, >> >> You can construct code by the attached patch and then run >> rdma_client/server to reproduce the issue. >> >> If resid is set to zero, running rdma_client/server: >> -------------------------------- >> # ./rdma_client -s 10.167.220.99 -p 8765 >> rdma_client: start >> rdma_client: end 0 >> >> # ./rdma_server -s 10.167.220.99 -p 8765 >> rdma_server: start >> wc.byte_len 0 recv_msg bbbbbbbbbbbbbbbb >> rdma_server: end 0 >> -------------------------------- >> >> rdma_client sends zero byte instead of 16 btyes to rdma_server. >> rdma_server receives zero byte instead of 16 btyes from rdma_client. >> >> Please also see the logic about resid in kernel, for example: >> -------------------------------- >> int rxe_requester(void *arg) >> { >> ... >> payload = (mask& RXE_WRITE_OR_SEND) ? wqe->dma.resid : 0; >> ... >> skb = init_req_packet(qp, wqe, opcode, payload,&pkt); >> ... >> } >> -------------------------------- >> >> Best Regards, >> Xiao Yang > Follow your steps on the latest rdma-core and linux upstream, > make tests with the followings: > " > diff --git a/librdmacm/examples/rdma_client.c b/librdmacm/examples/rdma_client.c > index c27047c5..6734757b 100644 > --- a/librdmacm/examples/rdma_client.c > +++ b/librdmacm/examples/rdma_client.c > @@ -52,6 +52,7 @@ static int run(void) > struct ibv_wc wc; > int ret; > > + memset(send_msg, 'a', 16); > memset(&hints, 0, sizeof hints); > hints.ai_port_space = RDMA_PS_TCP; > ret = rdma_getaddrinfo(server, port,&hints,&res); > diff --git a/librdmacm/examples/rdma_server.c b/librdmacm/examples/rdma_server.c > index f9c766b2..afa20996 100644 > --- a/librdmacm/examples/rdma_server.c > +++ b/librdmacm/examples/rdma_server.c > @@ -132,6 +132,7 @@ static int run(void) > goto out_disconnect; > } > > + printf("wc.byte_len %u recv_msg %s\n", wc.byte_len, recv_msg); > ret = rdma_post_send(id, NULL, send_msg, 16, send_mr, send_flags); > if (ret) { > perror("rdma_post_send"); > " > The followings are results: > > # ./build/bin/rdma_server -s 10.238.154.61 -p 5486& > [1] 10812 > # rdma_server: start > > # ./build/bin/rdma_client -s 10.238.154.61 -p 5486& > [2] 10815 > # rdma_client: start > wc.byte_len 16 recv_msg aaaaaaaaaaaaaaaa<--------------it seems > that inline is 16? > rdma_server: end 0 > rdma_client: end 0 > [1]- Done ./build/bin/rdma_server -s 10.238.154.61 -p 5486 > [2]+ Done ./build/bin/rdma_client -s 10.238.154.61 -p 5486 > > What does your commit fix? Hi Yanjun, You missed the change that sets resid to zero on purpose: ------------------------------------------------------------- diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c index 3c3ea8bb..ed5479fe 100644 --- a/providers/rxe/rxe.c +++ b/providers/rxe/rxe.c @@ -1492,7 +1492,7 @@ static int init_send_wqe(struct rxe_qp *qp, struct rxe_wq *sq, wqe->iova = ibwr->wr.rdma.remote_addr; wqe->dma.length = length; - wqe->dma.resid = length; + wqe->dma.resid = 0; wqe->dma.num_sge = num_sge; wqe->dma.cur_sge = 0; wqe->dma.sge_offset = 0; ------------------------------------------------------------- Note: It is also ok to remove "wqe->dma.resid = length" here because resid has been set to zero before. With the change, running rdma_client/server will show the impact. I didn't use new API(e.g. ibv_wr_set_inline_data, ibv_wr_send) to create a new test but it is enough to show the impact of resid == 0 by doing some changes on rxe and rdma_client/server. Best Regards, Xiao Yang > Zhu Yanjun > >>> Thanks >>> Zhu Yanjun >>> >>>> Resid should be set to the total length of transmitted bytes at the >>>> beginning. >>>> >>>> Note: >>>> Just remove the useless setting of resid in init_send_wqe(). >>>> >>>> Fixes: 1a894ca10105 ("Providers/rxe: Implement ibv_create_qp_ex verb") >>>> Fixes: 8337db5df125 ("Providers/rxe: Implement memory windows") >>>> Signed-off-by: Xiao Yang<yangx.jy@xxxxxxxxxxx> >>>> --- >>>> providers/rxe/rxe.c | 5 ++--- >>>> 1 file changed, 2 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c >>>> index 3c3ea8bb..3533a325 100644 >>>> --- a/providers/rxe/rxe.c >>>> +++ b/providers/rxe/rxe.c >>>> @@ -1004,7 +1004,7 @@ static void wr_set_inline_data(struct ibv_qp_ex *ibqp, void *addr, >>>> >>>> memcpy(wqe->dma.inline_data, addr, length); >>>> wqe->dma.length = length; >>>> - wqe->dma.resid = 0; >>>> + wqe->dma.resid = length; >>>> } >>>> >>>> static void wr_set_inline_data_list(struct ibv_qp_ex *ibqp, size_t num_buf, >>>> @@ -1035,6 +1035,7 @@ static void wr_set_inline_data_list(struct ibv_qp_ex *ibqp, size_t num_buf, >>>> } >>>> >>>> wqe->dma.length = tot_length; >>>> + wqe->dma.resid = tot_length; >>>> } >>>> >>>> static void wr_set_sge(struct ibv_qp_ex *ibqp, uint32_t lkey, uint64_t addr, >>>> @@ -1473,8 +1474,6 @@ static int init_send_wqe(struct rxe_qp *qp, struct rxe_wq *sq, >>>> if (ibwr->send_flags& IBV_SEND_INLINE) { >>>> uint8_t *inline_data = wqe->dma.inline_data; >>>> >>>> - wqe->dma.resid = 0; >>>> - >>>> for (i = 0; i< num_sge; i++) { >>>> memcpy(inline_data, >>>> (uint8_t *)(long)ibwr->sg_list[i].addr, >>>> -- >>>> 2.25.1 >>>> >>>> >>>>