Re: [PATCH 2/2] rxe: correctly calculate iCRC for unaligned payloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:
> On Tue, 2019-12-03 at 19:46 -0500, Doug Ledford wrote:
> > On Tue, 2019-12-03 at 08:25 -0800, Bart Van Assche wrote:
> > > On 12/2/19 6:03 PM, Steve Wise wrote:
> > > > If RoCE PDUs being sent or received contain pad bytes, then the
> > > > iCRC
> > > > is miscalculated resulting PDUs being emitted by RXE with an
> > > > incorrect
> > > > iCRC, as well as ingress PDUs being dropped due to erroneously
> > > > detecting
> > > > a bad iCRC in the PDU.  The fix is to include the pad bytes, if
> > > > any,
> > > > in iCRC computations.
> > >
> > > Should this description mention that this patch breaks
> > > compatibility
> > > with SoftRoCE drivers that do not include this fix? Do we need a
> > > kernel
> > > module parameter that allows to select either the old or the new
> > > behavior?
> >
> > No.  The original soft-RoCE driver was supposed to be compatible with
> > hardware devices.  Because of this bug, it obviously wasn't.  This is
> > a
> > bug fix, and we do not need to do anything to be compatible with the
> > broken behavior.  Instead, it just needs noting that the soft-RoCE
> > implementation in prior kernels has a known wire format bug that
> > impacts
> > communications with both fixed versions of the driver and real
> > hardware
> > devices.
>
> I've taken these two patches into for-rc (with fixups to the commit
> message on the second, as well as adding a Fixes: tag on the second).
>
> I stand by what I said about not needing a compatibility flag or module
> option for the user to set.  However, that isn't to say that we can't
> autodetect old soft-RoCE peers.  If we get a packet that fails CRC and
> has pad bytes, then re-run the CRC without the pad bytes and see if it
> matches.  If it does, we could A) mark the current QP as being to an old
> soft-RoCE device (causing us to send without including the pad bytes in
> the CRC) and B) allocate a struct old_soft_roce_peer and save the guid
> into that struct and then put that struct on a list that we then search
> any time we are creating a new queue pair and if the new queue pair goes
> to a guid in the list, then we immediately flag that qp as being to an
> old soft roce device and get the right behavior.  It would slow down qp
> creation somewhat due to the list search, but probably not enough to
> worry about.  No one will be doing a 1,000 node cluster MPI job over
> soft-RoCE, so we should never notice the list length causing search
> problems.  A patch to do something like that would be welcome.

Do you find this implementation needed? I see RXE as a development
platform and in my view it is unlikely that someone will run RXE in
production with mixture of different kernel versions, which requires
such compatibility fallback.

Thanks

>
> --
> Doug Ledford <dledford@xxxxxxxxxx>
>     GPG KeyID: B826A3330E572FDD
>     Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux