Re: [PATCH] IB/ipoib: CSUM support in connected mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 30, 2015 at 11:51:12AM -0400, Doug Ledford wrote:
> On 07/30/2015 11:20 AM, Yuval Shaia wrote:
> > On Thu, Jul 30, 2015 at 03:58:13PM +0200, Yann Droneaud wrote:
> >> Hi,
> >>
> >> Le jeudi 30 juillet 2015 à 04:46 -0700, Yuval Shaia a écrit :
> >>> This enhancement suggest the usage of IB CRC instead of CSUM in IPoIB 
> >>> CM. IPoIB CM uses RC (Reliable Connection) which guarantees the 
> >>> corruption free delivery of the packet.
> >>>
> >>> InfiniBand uses 32b CRC which provides stronger data integrity 
> >>> protection compare to 16b IP Checksum.
> >>
> >> InfiniBand 32b CRC <=> Ethernet 32b CRC, it's link layer, layer 2.
> >>
> >> IPv4 checksum is at another level, it's internet layer, layer 3.
> >>
> >>>  So, there is no added value that IP/TCP Checksum provides in the IB 
> >>> world.
> >>>
> >>
> >> Sure, IPv4 checksum is a thing of the past: checksum was dropped from
> >> IP header in IPv6: it assumes the lower layer, such as Ethernet,
> >> provides the required integrety check.
> >>
> >> I think not checking the IPv4 checksum should be a choice, carefully
> >> thought, for inside a fabric, as I understand your proposal, packet
> >> with invalid checksum will be allowed to go in/out of the fabric.
> > Yes, this is why it is controlled by module parameter.
> > Maybe a better choice would be to default it to 0.
> 
> In it's current form, yes, it should default to 0.
> 
> >>
> >> It sound like it's a departure from the behavior one can expect from an
> >> IPv4 network stack.
> > It should be considered as network-fine-tuning parameter so if admin knows his fabric he can use it.
> >>
> >>> The proposal is to tell network stack that IPoIB-CM supports IP 
> >>> Checksum offload. This enables the kernel to save the time of 
> >>> checksum calculation of IPoIB CM packets. Network sends the IP packet 
> >>> without adding the IP Checksum to the header. On the receive side, 
> >>> IPoIB driver again tells the network stack that IP Checksum is good 
> >>> for the incoming packets and network stack avoids the IP Checksum 
> >>> calculations.
> >>>
> >>> During connection establishment the driver determine if peer supports
> >>> IB CRC as checksum. This is done so driver will be able to calculate
> >>> checksum before transmiting the packet in case the peer does not 
> >>> support this feature.
> >>>
> >>
> >> Two questions:
> > Three :)
> 
> No, he really only had 2, the second one was a line split of the word
> checksum-less done by his mailer ;-)
> 
> >>
> >> - What will see tool such as wireshark/tcpdump when sniffing checksum
> > Zero or what ever the networking layer puts in csum when H/W supports CSUM-offloading.
> > Please note that with this patch driver still supports backward computability (per connection).
> > This means that for connections with peer which does not support this functionality you expect to see this value filled with checksum.
> >> -less IPv4 packets sent/received on IPoIB interface ?
> > No
> >>
> >> - What might happen if such checksum-less IPv4 packet is later routed to a different IPv4 network ?
> > As noted above, for network that is opened to outside world this feature should be blocked.
> > In general i would say that if a layer 2 terminator device (e.x router) exist in the fabric - this feature can't be used and must be blocked.
> > With this limitation it still worth use it because of the reason of increasing throughput
> 
> In its current state, I have my doubts about this patch.  However, it
> seems to me that this should be relatively easy to fix in such a way
> that you get 90%+ of the performance benefit, and can turn it on by
> default, and we don't cause any problems.  Why not perform the checksum
> operation on a per connection basis?  This is all IPoIB traffic anyway,
This part is already implemented.
Actually this is the main purpose of adding 'caps' field to ipoib_cm_tx.
The peer capabilities (currently only one option but design let us add
up to 12 capabilities in the future) is passed in IPoIB's private data and
saved in ipoib_cm_tx.caps per connection basis.
Then, on ipoib_cm_send, the decision is made based on that (and on some
other conditions) and if needed - the driver calculate the checksum just
before sending.
> so every send will have a src ip and dst ip.  If the dst ip is link
> local to our src ip device, and the connected mode partner is capable of
> running without csum, then send that specific packet without doing a
> checksum.  If the IP address is not link local, then do the checksum as
> normal.  That way if our final destination is on the other side of a
> router, we aren't leaking un-checksummed packets.  It means we would
> miss out on being able to do checksum-less transfers from host A on
> fabric 0 through host B as a router to host C on fabric 1, but I doubt
> that's a very common situation to be in.  Or maybe a better way of
> putting this is if our next hop IP address != our dest IP address, then
> perform the checksum, otherwise if capable of checksum-less operation,
> do so.  Can you rework the patch to operate in that manner?
I think that the concern with 'router' is that when packet goes into it
and then goes out from it - we cannot trust end-to-end IB-CRC as this is
layer 2 CRC.
So, if i understand you correctly, you suggest to tread every host beyond
a router as one that does not support this "fake" and to calculate csum
for it?
This make sense to me but does it cover all such cases (where we can't
trust end-to-end IB-CRC)?
If yes then sure, it is easy to implement.
This way we can default it to 1 and get rid of this module param.
> 
> 
> -- 
> Doug Ledford <dledford@xxxxxxxxxx>
>               GPG KeyID: 0E572FDD
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux