Re: [PATCH 2/2] rxe: correctly calculate iCRC for unaligned payloads

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/10/2019 11:24 PM, Doug Ledford wrote:
On Tue, 2019-12-10 at 08:54 +0200, Leon Romanovsky wrote:
On Mon, Dec 09, 2019 at 02:07:06PM -0500, Doug Ledford wrote:

I've taken these two patches into for-rc (with fixups to the commit
message on the second, as well as adding a Fixes: tag on the
second).

I stand by what I said about not needing a compatibility flag or
module
option for the user to set.  However, that isn't to say that we
can't
autodetect old soft-RoCE peers.  If we get a packet that fails CRC
and
has pad bytes, then re-run the CRC without the pad bytes and see if
it
matches.  If it does, we could A) mark the current QP as being to an
old
soft-RoCE device (causing us to send without including the pad bytes
in
the CRC) and B) allocate a struct old_soft_roce_peer and save the
guid
into that struct and then put that struct on a list that we then
search
any time we are creating a new queue pair and if the new queue pair
goes
to a guid in the list, then we immediately flag that qp as being to
an
old soft roce device and get the right behavior.  It would slow down
qp
creation somewhat due to the list search, but probably not enough to
worry about.  No one will be doing a 1,000 node cluster MPI job over
soft-RoCE, so we should never notice the list length causing search
problems.  A patch to do something like that would be welcome.

Do you find this implementation needed? I see RXE as a development
platform and in my view it is unlikely that someone will run RXE in
production with mixture of different kernel versions, which requires
such compatibility fallback.

It's not a requirement, that's why I took the patches as they were.  It
would just be a "nice to have".

The counterargument to this is that it only extends the protocol bug
into the future, and for one single RoCE implementation. No hardware
implementation will do this, as it violates the protocol. And, it
potentially opens a silent data corruption, by accepting packets which
don't actually pass the checksum.

Personally, I'd say it "nice to avoid", i.e. don't apply such a patch.

MHO.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux