Re: [PATCH 2/2] rxe: correctly calculate iCRC for unaligned payloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2019-12-03 at 19:46 -0500, Doug Ledford wrote:
> On Tue, 2019-12-03 at 08:25 -0800, Bart Van Assche wrote:
> > On 12/2/19 6:03 PM, Steve Wise wrote:
> > > If RoCE PDUs being sent or received contain pad bytes, then the
> > > iCRC
> > > is miscalculated resulting PDUs being emitted by RXE with an
> > > incorrect
> > > iCRC, as well as ingress PDUs being dropped due to erroneously
> > > detecting
> > > a bad iCRC in the PDU.  The fix is to include the pad bytes, if
> > > any,
> > > in iCRC computations.
> > 
> > Should this description mention that this patch breaks
> > compatibility 
> > with SoftRoCE drivers that do not include this fix? Do we need a
> > kernel 
> > module parameter that allows to select either the old or the new
> > behavior?
> 
> No.  The original soft-RoCE driver was supposed to be compatible with
> hardware devices.  Because of this bug, it obviously wasn't.  This is
> a
> bug fix, and we do not need to do anything to be compatible with the
> broken behavior.  Instead, it just needs noting that the soft-RoCE
> implementation in prior kernels has a known wire format bug that
> impacts
> communications with both fixed versions of the driver and real
> hardware
> devices.

I've taken these two patches into for-rc (with fixups to the commit
message on the second, as well as adding a Fixes: tag on the second).

I stand by what I said about not needing a compatibility flag or module
option for the user to set.  However, that isn't to say that we can't
autodetect old soft-RoCE peers.  If we get a packet that fails CRC and
has pad bytes, then re-run the CRC without the pad bytes and see if it
matches.  If it does, we could A) mark the current QP as being to an old
soft-RoCE device (causing us to send without including the pad bytes in
the CRC) and B) allocate a struct old_soft_roce_peer and save the guid
into that struct and then put that struct on a list that we then search
any time we are creating a new queue pair and if the new queue pair goes
to a guid in the list, then we immediately flag that qp as being to an
old soft roce device and get the right behavior.  It would slow down qp
creation somewhat due to the list search, but probably not enough to
worry about.  No one will be doing a 1,000 node cluster MPI job over
soft-RoCE, so we should never notice the list length causing search
problems.  A patch to do something like that would be welcome.

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux