Re: Re: receive side CRC computation in siw.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



-----"Tom Talpey" <tom@xxxxxxxxxx> wrote: -----

>To: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>
>From: "Tom Talpey" <tom@xxxxxxxxxx>
>Date: 06/13/2019 02:25PM
>Cc: "Jason Gunthorpe" <jgg@xxxxxxxx>, linux-rdma@xxxxxxxxxxxxxxx
>Subject: [EXTERNAL] Re: receive side CRC computation in siw.
>
>On 6/13/2019 7:36 AM, Bernard Metzler wrote:
>> -----"Tom Talpey" <tom@xxxxxxxxxx> wrote: -----
>> 
>>> To: "Jason Gunthorpe" <jgg@xxxxxxxx>
>>> From: "Tom Talpey" <tom@xxxxxxxxxx>
>>> Date: 06/12/2019 10:34PM
>>> Cc: "Bernard Metzler" <BMT@xxxxxxxxxxxxxx>,
>>> linux-rdma@xxxxxxxxxxxxxxx
>>> Subject: [EXTERNAL] Re: receive side CRC computation in siw.
>>>
>>> On 6/12/2019 4:13 PM, Jason Gunthorpe wrote:
>>>> On Wed, Jun 12, 2019 at 04:07:53PM -0400, Tom Talpey wrote:
>>>>> On 6/12/2019 11:21 AM, Jason Gunthorpe wrote:
>>>>>> On Tue, Jun 11, 2019 at 11:11:08AM -0400, Tom Talpey wrote:
>>>>>>> On 6/11/2019 9:21 AM, Bernard Metzler wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> If enabled for siw, during receive operation, a crc32c over
>>>>>>>> header and data is being generated and checked. So far, siw
>>>>>>>> was generating that CRC from the content of the just written
>>>>>>>> target buffer. What kept me busy last weekend were spurious
>>>>>>>> CRC errors, if running qperf. I finally found the application
>>>>>>>> is constantly writing the target buffer while data are placed
>>>>>>>> concurrently, which sometimes races with the CRC computation
>>>>>>>> for that buffer, and yields a broken CRC.
>>>>>>>
>>>>>>> Well, that's a clear bug in the application, assuming siw has
>>>>>>> not yet delivered a send completion for the operation using
>>>>>>> the buffer. This is a basic Verbs API contract.
>>>>>>
>>>>>> May be so, but a kernel driver must not make any assumptions
>>> about the
>>>>>> content of memory controlled by user. So it is clearly wrong to
>>> write
>>>>>> data to a user buffer and then read it again to compute a CRC.
>>>>>
>>>>> But it's not a user buffer. It's been mapped into the kernel for
>>> the
>>>>> purpose of registering and performing data transfer This is
>>> standard
>>>>> i/o processing. Both kernel and user have access.
>>>>
>>>> It is a user buffer because the user has access. In fact it may
>not
>>>> even be mapped into the kernel address space.
>>>
>>> Belaboring this point a bit, but SIW certainly maps it, in order
>to
>>> copy. An adapter maps it, via dma_map, in order to do the same. My
>>> point is simply that if the kernel tried to prevent that, the
>whole
>>> i/o model would break down.
>>>
>>> In other words, if a hardware adapter were doing this same thing,
>>> would you consider it out of spec? If so, why?
>>>
>>> Tom.
>>>
>>>>> Furthermore, an RDMA hardware adapter has zero notion of user
>>> buffers.
>>>>> All it gets is a registration, with memory described by dma
>>> addresses.
>>>>> It can perform whatever memory operations are required on them,
>>> and the
>>>>> kernel isn't even in the loop.
>>>>
>>>> Adapters cannot make assumptions about data they place in memory
>>>> buffers - ie they cannot write something and then read it back on
>>> the
>>>> assumption it has not changed. They cannot read something twice
>on
>>> the
>>>> assumption it has not changed, etc. It is a security requirement.
>>>>
>>>>>> All the applications touching buffers without waiting for a
>>> completion
>>>>>> are relying on some extended behavior outside the
>specification,
>>> but
>>>>>> they cannot cause the kernel to malfunction and report bogus
>data
>>>>>> integrity errors.
>>>>>
>>>>> Ok, this I agree with, but the RDMA specifications were quite
>>> careful
>>>>> about it. And we *definitely* don't want to require that the
>>> providers
>>>>> all start double-buffering incoming data, in order to shield an
>>>>> uncomplying application from itself. To double buffer RDMA
>Writes
>>> (and
>>>>> Sends) would undo the entire direct data placement design!
>>>>>
>>>>> Bernard, I'd still welcome your thoughts on whether you can
>>> compute
>>>>> the MPA CRC inline in SIW during the copy_to_user. Avoiding the
>>> overhead
>>>>> of reading back the data after copying could be a speedup for
>you?
>> 
>> It would have to be integrated into skb_copy_bits(). Best
>> allowing the caller to provide a variable digets which gets
>> updated on the go. So we might propose that to the right people?
>> 
>> In the mean time, I may have to accept a substantial
>> throughput breakdown if the CRC is switched on. For large data
>> chunks, perf is close to cut to half, compared to what I had
>> before with CRC.
>> 
>> Interestingly, the penalty mainly comes from walking
>> the skb again, and not from touching the data twice:
>> If I would provide an intermediate scratch buffer where I
>> copy into the skb content first, checksum it, and then
>> memcpy it to the target, I would almost maintain current
>> perf.
>> 
>> That obviously contradicts the RDMA concept ;) And it is
>> not really nice. At the other hand, since we would need
>> only one page size buffer per core (since siw RX runs in
>> softirq), it might not be a big waste of memory though...
>> What do others think?
>
>I hate the idea of forcing data through a bounce buffer. And I think
>that the performance hit will increase in a realworld workload, since
>in single-flow tests the CPU cache basically stays warm after your
>skb_copy_bits(). Once the processor gets more busy, cache eviction
>will change things for the worse - just when you least want it to.
>
>Tom.
>
Right. Sounds ugly. But I am not sure though about the
bad caching effects. It would be a per-CPU buffer, not
per-QP. Only one QP at a time would have access to that
buffer and do skb_copy_bits() into it and memcpy()
out of it back to back. siw rx path runs in softirq and
does not get interrupted by RX of another QP on the same
CPU.

Thanks
Bernard.
>> 
>> 
>>>>
>>>> Copy and CRC is obviously the right thing to do.
>>>>
>>>> Jason
>>>>
>> I think you mean CRC first and copy then.
>> 
>> Many thanks for that fruitful discussion.
>> Bernard.
>> 
>> 
>> 
>
>




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux