Re: receive side CRC computation in siw.

"Bernard Metzler" <BMT@xxxxxxxxxxxxxx> · Thu, 13 Jun 2019 11:36:38 +0000

-----"Tom Talpey" <tom@xxxxxxxxxx> wrote: -----

>To: "Jason Gunthorpe" <jgg@xxxxxxxx>
>From: "Tom Talpey" <tom@xxxxxxxxxx>
>Date: 06/12/2019 10:34PM
>Cc: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>,
>linux-rdma@xxxxxxxxxxxxxxx
>Subject: [EXTERNAL] Re: receive side CRC computation in siw.
>
>On 6/12/2019 4:13 PM, Jason Gunthorpe wrote:
>> On Wed, Jun 12, 2019 at 04:07:53PM -0400, Tom Talpey wrote:
>>> On 6/12/2019 11:21 AM, Jason Gunthorpe wrote:
>>>> On Tue, Jun 11, 2019 at 11:11:08AM -0400, Tom Talpey wrote:
>>>>> On 6/11/2019 9:21 AM, Bernard Metzler wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> If enabled for siw, during receive operation, a crc32c over
>>>>>> header and data is being generated and checked. So far, siw
>>>>>> was generating that CRC from the content of the just written
>>>>>> target buffer. What kept me busy last weekend were spurious
>>>>>> CRC errors, if running qperf. I finally found the application
>>>>>> is constantly writing the target buffer while data are placed
>>>>>> concurrently, which sometimes races with the CRC computation
>>>>>> for that buffer, and yields a broken CRC.
>>>>>
>>>>> Well, that's a clear bug in the application, assuming siw has
>>>>> not yet delivered a send completion for the operation using
>>>>> the buffer. This is a basic Verbs API contract.
>>>>
>>>> May be so, but a kernel driver must not make any assumptions
>about the
>>>> content of memory controlled by user. So it is clearly wrong to
>write
>>>> data to a user buffer and then read it again to compute a CRC.
>>>
>>> But it's not a user buffer. It's been mapped into the kernel for
>the
>>> purpose of registering and performing data transfer This is
>standard
>>> i/o processing. Both kernel and user have access.
>> 
>> It is a user buffer because the user has access. In fact it may not
>> even be mapped into the kernel address space.
>
>Belaboring this point a bit, but SIW certainly maps it, in order to
>copy. An adapter maps it, via dma_map, in order to do the same. My
>point is simply that if the kernel tried to prevent that, the whole
>i/o model would break down.
>
>In other words, if a hardware adapter were doing this same thing,
>would you consider it out of spec? If so, why?
>
>Tom.
>
>>> Furthermore, an RDMA hardware adapter has zero notion of user
>buffers.
>>> All it gets is a registration, with memory described by dma
>addresses.
>>> It can perform whatever memory operations are required on them,
>and the
>>> kernel isn't even in the loop.
>> 
>> Adapters cannot make assumptions about data they place in memory
>> buffers - ie they cannot write something and then read it back on
>the
>> assumption it has not changed. They cannot read something twice on
>the
>> assumption it has not changed, etc. It is a security requirement.
>> 
>>>> All the applications touching buffers without waiting for a
>completion
>>>> are relying on some extended behavior outside the specification,
>but
>>>> they cannot cause the kernel to malfunction and report bogus data
>>>> integrity errors.
>>>
>>> Ok, this I agree with, but the RDMA specifications were quite
>careful
>>> about it. And we *definitely* don't want to require that the
>providers
>>> all start double-buffering incoming data, in order to shield an
>>> uncomplying application from itself. To double buffer RDMA Writes
>(and
>>> Sends) would undo the entire direct data placement design!
>>>
>>> Bernard, I'd still welcome your thoughts on whether you can
>compute
>>> the MPA CRC inline in SIW during the copy_to_user. Avoiding the
>overhead
>>> of reading back the data after copying could be a speedup for you?

It would have to be integrated into skb_copy_bits(). Best
allowing the caller to provide a variable digets which gets
updated on the go. So we might propose that to the right people?

In the mean time, I may have to accept a substantial
throughput breakdown if the CRC is switched on. For large data
chunks, perf is close to cut to half, compared to what I had
before with CRC.

Interestingly, the penalty mainly comes from walking
the skb again, and not from touching the data twice:
If I would provide an intermediate scratch buffer where I
copy into the skb content first, checksum it, and then
memcpy it to the target, I would almost maintain current
perf.

That obviously contradicts the RDMA concept ;) And it is
not really nice. At the other hand, since we would need
only one page size buffer per core (since siw RX runs in
softirq), it might not be a big waste of memory though...
What do others think?

>> 
>> Copy and CRC is obviously the right thing to do.
>> 
>> Jason
>> 
I think you mean CRC first and copy then.

Many thanks for that fruitful discussion.
Bernard.