Re: receive side CRC computation in siw.

"Bernard Metzler" <BMT@xxxxxxxxxxxxxx> · Wed, 12 Jun 2019 09:08:43 +0000

-----"Tom Talpey" <tom@xxxxxxxxxx> wrote: -----

>To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>,
>linux-rdma@xxxxxxxxxxxxxxx
>From: "Tom Talpey" <tom@xxxxxxxxxx>
>Date: 06/11/2019 05:11PM
>Subject: [EXTERNAL] Re: receive side CRC computation in siw.
>
>On 6/11/2019 9:21 AM, Bernard Metzler wrote:
>> Hi all,
>> 
>> If enabled for siw, during receive operation, a crc32c over
>> header and data is being generated and checked. So far, siw
>> was generating that CRC from the content of the just written
>> target buffer. What kept me busy last weekend were spurious
>> CRC errors, if running qperf. I finally found the application
>> is constantly writing the target buffer while data are placed
>> concurrently, which sometimes races with the CRC computation
>> for that buffer, and yields a broken CRC.
>
>Well, that's a clear bug in the application, assuming siw has
>not yet delivered a send completion for the operation using
>the buffer. This is a basic Verbs API contract.
>
Well, it is receive side. And it might go non-signaled.
The case I looked at was a RDMA Write which got placed
into a buffer which is concurrently being written by
the application. Yes, the Verbs API claims adapter's
ownership for buffers referenced by posted, but not
yet completed operations. At target side, there is no
outstanding local operation being completed by an inbound
WRITE (as a difference to an inbound READ response).
That's why I am unsure.

>> siw uses skb_copy_bits() to move the data. I now added an
>> extra round of skb walking via __skb_checksum() in front of
>> it, which resolves the issue. Unfortunately, performance
>> significantly drops with that (some 30% or worse compared
>> to generating the CRC from a linear buffer).
>> 
>> To preserve performance for kernel clients, I propose
>> checksumming the data before the copy only for user
>> land applications, and leave it as is for kernel clients.
>> I am not aware of kernel clients which are constantly
>> reading/writing a target buffer to detect it has been written.
>
>This, too, is an invalid application. The RDMA provider is free
>to write the target buffer at any time. Many implementations
>take full advantage of this by placing received RDMA Writes
>in memory prior to validating their packets' checksum(s). If
>there is a mismatch, retries are initiated. The realtime
>contents of the buffers, prior to a completion, are undefined.
>
Right, placing the data first and checksum those only thereafter
adds another level of confidence the buffer contains the right
data. Something a offloaded RDMA engine will typically not do
though. And as a result, some applications out there (such as
qperf) assume to have the right to write the buffer anytime.

Furthermore, a SW implementation of iWarp, such as siw,
is not fully integrated with the transport (TCP). While
MPA detects the CRC error, the segment cannot silently be
dropped and re-transmitted by the transport, since typically
already acknowledged to the peer. With that, an invalid CRC
will cause a connection loss. siw will just send an RDMAP
TERMINATE frame with error code 'CRC mismatch' as defined
per spec.

Thanks,
Bernard.

>Tom.
>
>> I also checked other kernel code using skb_copy_bits(),
>> which also needs to checksum the received data. Those code
>> (such as nvme tcp) also does the CRC on the linear buffer
>> after data receive.
>> 
>> The best solution might be to fold the CRC into the
>> skb_copy_bits() function itself. That being something we
>> might propose later?
>> 
>> Thoughts?
>> 
>> 
>> Many thanks,
>> Bernard.
>> 
>> 
>> 
>
>