Re: insight into a WARNING from softROCE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 25, 2017 at 6:09 AM, Moni Shoua <monis@xxxxxxxxxxxx> wrote:
> On Fri, Dec 22, 2017 at 5:43 PM, Olga Kornievskaia
> <olga.kornievskaia@xxxxxxxxx> wrote:
>>
>> Hi Moni,
>>
>>> On Dec 21, 2017, at 3:19 AM, Moni Shoua <monis@xxxxxxxxxxxx> wrote:
>>>
>>> Hi Olga
>>> As far as I can tell the warning in
>>> drivers/infiniband/sw/rxe/rxe_comp.c:741 went through check_psn() ->
>>> COMPST_ERROR_RETRY -> COMPST_ERROR. In that case the wqe_status should
>>> have been IB_WC_RETRY_EXC_ERR and not IB_WC_SUCCESS.
>>
>> My conclusion was from trying to figure out why the warning was seen in var log messages which then followed with error that retry limit exceeded and connection manager dropping and re establishing the connection. Sounds like my conclusion wasnt correct.
>>
>> It seems like this warning is meant to signal that something went wrong in the code and this state of wqe status success yet being in error state is unexpected.
>>
>> I thought the developers would be interested in investigating but maybe it’s not an interesting condition. Specially since it sounds like your assessment is that packet loss causes the hiccup.
>>
>> I wish then there was a warning that notes packet loss and warns the user. I understand the protocol assumes lossless communication so it shouldn’t be dealing w packet loss.
>>
>>> Can you please be more specific and explain how did you get to this conclusion?
>>
>> What other specifics can I provide? I added printks trying to trace the WARN message. Should I share a patch w printks w the output?
> I wonder how we get to this point when status is IB_WC_SUCCESS but
> through retry exceeded error. If you traced it maybe you can explain.

Sorry I was pulled off into doing testing over hardware RDMA and
haven't gotten back to softRoce. I will try get back it soon. I did
re-run this on the real machines (instead of VMs) and I saw the same
WARNING (on 4.15-rc4) kernel. In this case, softROCE was run over 10G
NICs using ibg drivers.

>>> BTW, what was the test you were running?
>>
>> I was running NFS testsuite (cthon). This was done on a laptop running 2VMs.
> I don't promise that we will run it immediately but it if you provide
> a HOWTO for this test I will appreciate it.

git clone git://git.linux-nfs.org/projects/steved/cthon04.git
cd cthon04
make
mount -o vers=4.1,rdma,port=20049 <serverip>:<servermountpoint> /mnt
(if instructions are needed to setup rdma nfs it would really depend
on which distro you are using. Redhat has howto to setup NFSoRDMA).
./runtest -a -f /mnt/data
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux