On Tue, Aug 17, 2021 at 8:54 AM yangx.jy@xxxxxxxxxxx <yangx.jy@xxxxxxxxxxx> wrote: > > On 2021/8/16 15:47, Zhu Yanjun wrote: > > On Wed, Aug 11, 2021 at 12:33 AM yangx.jy@xxxxxxxxxxx > > <yangx.jy@xxxxxxxxxxx> wrote: > >> On 2021/8/10 11:40, Zhu Yanjun wrote: > >>> On Mon, Aug 9, 2021 at 10:43 PM Xiao Yang<yangx.jy@xxxxxxxxxxx> wrote: > >>>> Resid indicates the residual length of transmitted bytes but current > >>>> rxe sets it to zero for inline data at the beginning. In this case, > >>>> request will transmit zero byte to responder wrongly. > >>> What are the symptoms if resid is set to zero? > >> Hi Yanjun, > >> > >> You can construct code by the attached patch and then run > >> rdma_client/server to reproduce the issue. > >> > >> If resid is set to zero, running rdma_client/server: > >> -------------------------------- > >> # ./rdma_client -s 10.167.220.99 -p 8765 > >> rdma_client: start > >> rdma_client: end 0 > >> > >> # ./rdma_server -s 10.167.220.99 -p 8765 > >> rdma_server: start > >> wc.byte_len 0 recv_msg bbbbbbbbbbbbbbbb > >> rdma_server: end 0 > >> -------------------------------- > >> > >> rdma_client sends zero byte instead of 16 btyes to rdma_server. > >> rdma_server receives zero byte instead of 16 btyes from rdma_client. > >> > >> Please also see the logic about resid in kernel, for example: > >> -------------------------------- > >> int rxe_requester(void *arg) > >> { > >> ... > >> payload = (mask& RXE_WRITE_OR_SEND) ? wqe->dma.resid : 0; > >> ... > >> skb = init_req_packet(qp, wqe, opcode, payload,&pkt); > >> ... > >> } > >> -------------------------------- > >> > >> Best Regards, > >> Xiao Yang > > Follow your steps on the latest rdma-core and linux upstream, > > make tests with the followings: > > " > > diff --git a/librdmacm/examples/rdma_client.c b/librdmacm/examples/rdma_client.c > > index c27047c5..6734757b 100644 > > --- a/librdmacm/examples/rdma_client.c > > +++ b/librdmacm/examples/rdma_client.c > > @@ -52,6 +52,7 @@ static int run(void) > > struct ibv_wc wc; > > int ret; > > > > + memset(send_msg, 'a', 16); > > memset(&hints, 0, sizeof hints); > > hints.ai_port_space = RDMA_PS_TCP; > > ret = rdma_getaddrinfo(server, port,&hints,&res); > > diff --git a/librdmacm/examples/rdma_server.c b/librdmacm/examples/rdma_server.c > > index f9c766b2..afa20996 100644 > > --- a/librdmacm/examples/rdma_server.c > > +++ b/librdmacm/examples/rdma_server.c > > @@ -132,6 +132,7 @@ static int run(void) > > goto out_disconnect; > > } > > > > + printf("wc.byte_len %u recv_msg %s\n", wc.byte_len, recv_msg); > > ret = rdma_post_send(id, NULL, send_msg, 16, send_mr, send_flags); > > if (ret) { > > perror("rdma_post_send"); > > " > > The followings are results: > > > > # ./build/bin/rdma_server -s 10.238.154.61 -p 5486& > > [1] 10812 > > # rdma_server: start > > > > # ./build/bin/rdma_client -s 10.238.154.61 -p 5486& > > [2] 10815 > > # rdma_client: start > > wc.byte_len 16 recv_msg aaaaaaaaaaaaaaaa<--------------it seems > > that inline is 16? > > rdma_server: end 0 > > rdma_client: end 0 > > [1]- Done ./build/bin/rdma_server -s 10.238.154.61 -p 5486 > > [2]+ Done ./build/bin/rdma_client -s 10.238.154.61 -p 5486 > > > > What does your commit fix? > Hi Yanjun, > > You missed the change that sets resid to zero on purpose: > ------------------------------------------------------------- > > diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c > index 3c3ea8bb..ed5479fe 100644 > --- a/providers/rxe/rxe.c > +++ b/providers/rxe/rxe.c > @@ -1492,7 +1492,7 @@ static int init_send_wqe(struct rxe_qp *qp, struct rxe_wq *sq, > wqe->iova = ibwr->wr.rdma.remote_addr; > > wqe->dma.length = length; > - wqe->dma.resid = length; > + wqe->dma.resid = 0; > wqe->dma.num_sge = num_sge; > wqe->dma.cur_sge = 0; > wqe->dma.sge_offset = 0; > > ------------------------------------------------------------- > Note: > It is also ok to remove "wqe->dma.resid = length" here because resid has > been set to zero before. > > With the change, running rdma_client/server will show the impact. With the original source code, the rdma_server/rdma_client can work well. Why "wqe->dma.resid = length;" is replaced by "wqe->dma.resid = 0;", then this problem appears? Thanks Zhu Yanjun > > I didn't use new API(e.g. ibv_wr_set_inline_data, ibv_wr_send) to create > a new test > but it is enough to show the impact of resid == 0 by doing some changes > on rxe and > rdma_client/server. > > Best Regards, > Xiao Yang > > Zhu Yanjun > > > >>> Thanks > >>> Zhu Yanjun > >>> > >>>> Resid should be set to the total length of transmitted bytes at the > >>>> beginning. > >>>> > >>>> Note: > >>>> Just remove the useless setting of resid in init_send_wqe(). > >>>> > >>>> Fixes: 1a894ca10105 ("Providers/rxe: Implement ibv_create_qp_ex verb") > >>>> Fixes: 8337db5df125 ("Providers/rxe: Implement memory windows") > >>>> Signed-off-by: Xiao Yang<yangx.jy@xxxxxxxxxxx> > >>>> --- > >>>> providers/rxe/rxe.c | 5 ++--- > >>>> 1 file changed, 2 insertions(+), 3 deletions(-) > >>>> > >>>> diff --git a/providers/rxe/rxe.c b/providers/rxe/rxe.c > >>>> index 3c3ea8bb..3533a325 100644 > >>>> --- a/providers/rxe/rxe.c > >>>> +++ b/providers/rxe/rxe.c > >>>> @@ -1004,7 +1004,7 @@ static void wr_set_inline_data(struct ibv_qp_ex *ibqp, void *addr, > >>>> > >>>> memcpy(wqe->dma.inline_data, addr, length); > >>>> wqe->dma.length = length; > >>>> - wqe->dma.resid = 0; > >>>> + wqe->dma.resid = length; > >>>> } > >>>> > >>>> static void wr_set_inline_data_list(struct ibv_qp_ex *ibqp, size_t num_buf, > >>>> @@ -1035,6 +1035,7 @@ static void wr_set_inline_data_list(struct ibv_qp_ex *ibqp, size_t num_buf, > >>>> } > >>>> > >>>> wqe->dma.length = tot_length; > >>>> + wqe->dma.resid = tot_length; > >>>> } > >>>> > >>>> static void wr_set_sge(struct ibv_qp_ex *ibqp, uint32_t lkey, uint64_t addr, > >>>> @@ -1473,8 +1474,6 @@ static int init_send_wqe(struct rxe_qp *qp, struct rxe_wq *sq, > >>>> if (ibwr->send_flags& IBV_SEND_INLINE) { > >>>> uint8_t *inline_data = wqe->dma.inline_data; > >>>> > >>>> - wqe->dma.resid = 0; > >>>> - > >>>> for (i = 0; i< num_sge; i++) { > >>>> memcpy(inline_data, > >>>> (uint8_t *)(long)ibwr->sg_list[i].addr, > >>>> -- > >>>> 2.25.1 > >>>> > >>>> > >>>>