On 8/3/21 7:21 PM, Zhu Yanjun wrote:
On Wed, Aug 4, 2021 at 10:03 AM Shoaib Rao <rao.shoaib@xxxxxxxxxx> wrote:
On 8/3/21 5:51 PM, Zhu Yanjun wrote:
On Wed, Aug 4, 2021 at 7:53 AM Shoaib Rao <rao.shoaib@xxxxxxxxxx> wrote:
Hi Zhu,
Any update on your testing after applying Bob's fixes
Do you read my problem carefully?
I mean that before your commit, the whole rxe can work well.
After your commit, the rxe can not work well.
Please reproduce this problem in your host and fix it.
Zhu Yanjun
You posted
In my daily tests, I found that one host 5.12-stable, the other host
is 5.14.-rc3 + this commit.
rping can not work. Sometimes crash will occur.
It seems that changing maximum values breaks backward compatibility.
But without this commit, that is, 5.12-stable <-------> 5.14-rc3,
rping can work well.
Zhu Yanjun
I am not sure how you made rxe to work because it did not work for me
and neither for Bob. Since then, Bob has posted patches for the issue. I
also posted that my changes work on 5.13.6 kernel. emails attached.
Even if rxe in 5.14 is working for you some how, please apply Bob's
patches and then mine and test.
I have already applied this commit
https://urldefense.com/v3/__https://patchwork.kernel.org/project/linux-rdma/patch/20210729220039.18549-3-rpearsonhpe@xxxxxxxxx/__;!!ACWV5N9M2RV99hQ!b2c47MGvP_kCr0tkQgySPZaB3QX3DMeh4l_iwAS3IQHh9R589oF9BWrcgftcidGA$ .
And with your commit, rxe can not work well.
Zhu Yanjun
I am not sure how anyone can claim that the code works without my
changes. Rxe in Linux 5.14-rc4 is broken due to following change made to
rxe_cq_post() and will cause panic or corruption guaranteed.
addr = producer_addr(cq->queue, QUEUE_TYPE_TO_CLIENT);
It should be
addr = producer_addr(cq->queue, QUEUE_TYPE_FROM_CLIENT);
The following function also seems wrong
static inline void *producer_addr(struct rxe_queue *q, enum queue_type
type)
{
u32 prod;
switch (type) {
case QUEUE_TYPE_FROM_CLIENT:
/* protect user space index */
prod = smp_load_acquire(&q->buf->producer_index);
prod &= q->index_mask;
break;
case QUEUE_TYPE_TO_CLIENT:
prod = q->index;
break;
}
return q->buf->data + (prod << q->log2_elem_size);
}
index should be returned as it is.
The code has changed again in v5.14-rc4-22-g251a1524293d, so now I have
to try again.
Can we please make sure that the code is working after the application
of each patch or else it is a moving target.
BTW I liked the old code as it distinctly said what was being returned.
Shoaib
Thanks,
Shoaib
Shoaib
On 7/29/21 5:34 PM, Shoaib Rao wrote:
Thanks Bob.
Zhu can you please apply those patches and test.
Shoaib
On 7/29/21 4:08 PM, Pearson, Robert B wrote:
I found another rxe bug (for SRQ) and sent three bug fixes in a set
including the one you mention. They should all be applied.
-----Original Message-----
From: Jason Gunthorpe <jgg@xxxxxxxx>
Sent: Thursday, July 29, 2021 2:51 PM
To: Shoaib Rao <rao.shoaib@xxxxxxxxxx>
Cc: Zhu Yanjun <zyjzyj2000@xxxxxxxxx>; RDMA mailing list
<linux-rdma@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH v3 0/1] RDMA/rxe: Bump up default maximum values
used via uverbs
On Thu, Jul 29, 2021 at 12:33:14PM -0700, Shoaib Rao wrote:
Can we please accept my initial patch where I bumped up the values of
a few parameters. We have extensively tested with those values. I will
try to resolve CRC errors and panic and make changes to other
tuneables later?
I think Bob posted something for the icrc issues already
Please try to work in a sane fashion, rxe shouldn't be left broken
with so many people apparently interested in it??
Jason