Re: [PATCH] IB/ipoib: CSUM support in connected mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/30/2015 11:20 AM, Yuval Shaia wrote:
> On Thu, Jul 30, 2015 at 03:58:13PM +0200, Yann Droneaud wrote:
>> Hi,
>>
>> Le jeudi 30 juillet 2015 à 04:46 -0700, Yuval Shaia a écrit :
>>> This enhancement suggest the usage of IB CRC instead of CSUM in IPoIB 
>>> CM. IPoIB CM uses RC (Reliable Connection) which guarantees the 
>>> corruption free delivery of the packet.
>>>
>>> InfiniBand uses 32b CRC which provides stronger data integrity 
>>> protection compare to 16b IP Checksum.
>>
>> InfiniBand 32b CRC <=> Ethernet 32b CRC, it's link layer, layer 2.
>>
>> IPv4 checksum is at another level, it's internet layer, layer 3.
>>
>>>  So, there is no added value that IP/TCP Checksum provides in the IB 
>>> world.
>>>
>>
>> Sure, IPv4 checksum is a thing of the past: checksum was dropped from
>> IP header in IPv6: it assumes the lower layer, such as Ethernet,
>> provides the required integrety check.
>>
>> I think not checking the IPv4 checksum should be a choice, carefully
>> thought, for inside a fabric, as I understand your proposal, packet
>> with invalid checksum will be allowed to go in/out of the fabric.
> Yes, this is why it is controlled by module parameter.
> Maybe a better choice would be to default it to 0.

In it's current form, yes, it should default to 0.

>>
>> It sound like it's a departure from the behavior one can expect from an
>> IPv4 network stack.
> It should be considered as network-fine-tuning parameter so if admin knows his fabric he can use it.
>>
>>> The proposal is to tell network stack that IPoIB-CM supports IP 
>>> Checksum offload. This enables the kernel to save the time of 
>>> checksum calculation of IPoIB CM packets. Network sends the IP packet 
>>> without adding the IP Checksum to the header. On the receive side, 
>>> IPoIB driver again tells the network stack that IP Checksum is good 
>>> for the incoming packets and network stack avoids the IP Checksum 
>>> calculations.
>>>
>>> During connection establishment the driver determine if peer supports
>>> IB CRC as checksum. This is done so driver will be able to calculate
>>> checksum before transmiting the packet in case the peer does not 
>>> support this feature.
>>>
>>
>> Two questions:
> Three :)

No, he really only had 2, the second one was a line split of the word
checksum-less done by his mailer ;-)

>>
>> - What will see tool such as wireshark/tcpdump when sniffing checksum
> Zero or what ever the networking layer puts in csum when H/W supports CSUM-offloading.
> Please note that with this patch driver still supports backward computability (per connection).
> This means that for connections with peer which does not support this functionality you expect to see this value filled with checksum.
>> -less IPv4 packets sent/received on IPoIB interface ?
> No
>>
>> - What might happen if such checksum-less IPv4 packet is later routed to a different IPv4 network ?
> As noted above, for network that is opened to outside world this feature should be blocked.
> In general i would say that if a layer 2 terminator device (e.x router) exist in the fabric - this feature can't be used and must be blocked.
> With this limitation it still worth use it because of the reason of increasing throughput

In its current state, I have my doubts about this patch.  However, it
seems to me that this should be relatively easy to fix in such a way
that you get 90%+ of the performance benefit, and can turn it on by
default, and we don't cause any problems.  Why not perform the checksum
operation on a per connection basis?  This is all IPoIB traffic anyway,
so every send will have a src ip and dst ip.  If the dst ip is link
local to our src ip device, and the connected mode partner is capable of
running without csum, then send that specific packet without doing a
checksum.  If the IP address is not link local, then do the checksum as
normal.  That way if our final destination is on the other side of a
router, we aren't leaking un-checksummed packets.  It means we would
miss out on being able to do checksum-less transfers from host A on
fabric 0 through host B as a router to host C on fabric 1, but I doubt
that's a very common situation to be in.  Or maybe a better way of
putting this is if our next hop IP address != our dest IP address, then
perform the checksum, otherwise if capable of checksum-less operation,
do so.  Can you rework the patch to operate in that manner?


-- 
Doug Ledford <dledford@xxxxxxxxxx>
              GPG KeyID: 0E572FDD


Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux