-----"Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> wrote: ----- >To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>, sagi@xxxxxxxxxxx, >hch@xxxxxx >From: "Krishnamraju Eraparaju" <krishna2@xxxxxxxxxxx> >Date: 03/16/2020 05:20PM >Cc: linux-nvme@xxxxxxxxxxxxxxxxxxx, linux-rdma@xxxxxxxxxxxxxxx, >"Nirranjan Kirubaharan" <nirranjan@xxxxxxxxxxx>, "Potnuri Bharat >Teja" <bharat@xxxxxxxxxxx> >Subject: [EXTERNAL] broken CRCs at NVMeF target with SIW & NVMe/TCP >transports > >I'm seeing broken CRCs at NVMeF target while running the below >program >at host. Here RDMA transport is SoftiWARP, but I'm also seeing the >same issue with NVMe/TCP aswell. > >It appears to me that the same buffer is being rewritten by the >application/ULP before getting the completion for the previous >requests. >getting the completion for the previous requests. HW based >HW based trasports(like iw_cxgb4) are not showing this issue because >they copy/DMA and then compute the CRC on copied buffer. > Thanks Krishna! Yes, I see those errors as well. For TCP/NVMeF, I see it if the data digest is enabled, which is functional similar to have CRC enabled for iWarp. This appears to be your suggested '-G' command line switch during TCP connect. For SoftiWarp at host side and iWarp hardware at target side, CRC gets enabled. Then I see that problem at host side for SEND type work requests: A page of data referenced by the SEND gets sometimes modified by the ULP after CRC computation and before the data gets handed over (copied) to TCP via kernel_sendmsg(), and far before the ULP reaps a work completion for that SEND. So the ULP sometimes touches the buffer after passing ownership to the provider, and before getting it back by a matching work completion. With siw and CRC switched off, this issue goes undetected, since TCP copies the buffer at some point in time, and only computes its TCP/IP checksum on a stable copy, or typically even offloaded. Another question is if it is possible that we are finally placing stale data, or if closing the file recovers the error by re-sending affected data. With my experiments, until now I never detected broken file content after file close. Thanks, Bernard. >Please share your thoughts/comments/suggestions on this. > >Commands used: >-------------- >#nvme connect -t tcp -G -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for >NVMe/TCP >#nvme connect -t rdma -a 102.1.1.6 -s 4420 -n nvme-ram0 ==> for >SoftiWARP >#mkfs.ext3 -F /dev/nvme0n1 (issue occuring frequency is more with >ext3 >than ext4) >#mount /dev/nvme0n1 /mnt >#Then run the below program: >#include <stdlib.h> >#include <stdio.h> >#include <string.h> >#include <unistd.h> > >int main() { > int i; > char* line1 = "123"; > FILE* fp; > while(1) { > fp = fopen("/mnt/tmp.txt", "w"); > setvbuf(fp, NULL, _IONBF, 0); > for (i=0; i<100000; i++) > if ((fwrite(line1, 1, strlen(line1), fp) != >strlen(line1))) > exit(1); > > if (fclose(fp) != 0) > exit(1); > } >return 0; >} > >DMESG at NVMe/TCP Target: >[ +5.119267] nvmet_tcp: queue 2: cmd 83 pdu (6) data digest error: >recv >0xb1acaf93 expected 0xcd0b877d >[ +0.000017] nvmet: ctrl 1 fatal error occurred! > > >Thanks, >Krishna. > >