On 3/17/2020 5:31 AM, Bernard Metzler wrote:
-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: -----
To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>, sagi@xxxxxxxxxxx,
hch@xxxxxx
From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx>
Date: 03/16/2020 05:20PM
Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx,
"Nirranjan Kirubaharan" <nirranjan@xxxxxxxxxxx>, "Potnuri Bharat
Teja" <bharat@xxxxxxxxxxx>
Subject: [EXTERNAL] broken CRCs at NVMeF target with SIW & NVMe/TCP
transports
I'm seeing broken CRCs at NVMeF target while running the below
program
at host. Here RDMA transport is SoftiWARP, but I'm also seeing the
same issue with NVMe/TCP aswell.
It appears to me that the same buffer is being rewritten by the
application/ULP before getting the completion for the previous
requests.
getting the completion for the previous requests. HW based
HW based trasports(like iw_cxgb4) are not showing this issue because
they copy/DMA and then compute the CRC on copied buffer.
Thanks Krishna!
Yes, I see those errors as well. For TCP/NVMeF, I see it if
the data digest is enabled, which is functional similar to
have CRC enabled for iWarp. This appears to be your suggested
'-G' command line switch during TCP connect.
For SoftiWarp at host side and iWarp hardware at target side,
CRC gets enabled. Then I see that problem at host side for
SEND type work requests: A page of data referenced by the
SEND gets sometimes modified by the ULP after CRC computation
and before the data gets handed over (copied) to TCP via
kernel_sendmsg(), and far before the ULP reaps a work
completion for that SEND. So the ULP sometimes touches the
buffer after passing ownership to the provider, and before
getting it back by a matching work completion.
Well, that's a plain ULP bug. It's the very definition of a
send queue work request completion that the buffer has been
accepted by the LLP. Would the ULP read a receive buffer before
getting a completion? Same thing. Would the ULP complain if
its application consumer wrote data into async i/o O_DIRECT
buffers, or while it computed a krb5i hash? Yep.
With siw and CRC switched off, this issue goes undetected,
since TCP copies the buffer at some point in time, and
only computes its TCP/IP checksum on a stable copy, or
typically even offloaded.
An excellent test, and I'd love to know what ULPs/apps you
caught with it.
Tom.
Another question is if it is possible that we are finally
placing stale data, or if closing the file recovers the
error by re-sending affected data. With my experiments,
until now I never detected broken file content after
file close.
Thanks,
Bernard.
Please share your thoughts/comments/suggestions on this.
Commands used:
--------------
#nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
NVMe/TCP
#nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for
SoftiWARP
#mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with
ext3
than ext4)
#mount /dev/nvme0n1 /mnt
#Then run the below program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
int main() {
int i;
char* line1 = "123";
FILE* fp;
while(1) {
fp = fopen("/mnt/tmp.txt", "w");
setvbuf(fp, NULL, _IONBF, 0);
for (i=0; i<100000; i++)
if ((fwrite(line1, 1, strlen(line1), fp) !=
strlen(line1)))
exit(1);
if (fclose(fp) != 0)
exit(1);
}
return 0;
}
DMESG at NVMe/TCP Target:
[ +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error:
recv
0xb1acaf93 expected 0xcd0b877d
[ +0.000017] nvmet: ctrl 1 fatal error occurred!
Thanks,
Krishna.