Re: linux rdma 3.14 merge plans

Sagi Grimberg <sagig@xxxxxxxxxxxxxxxxxx> · Wed, 22 Jan 2014 11:48:03 +0200

On 1/22/2014 2:43 AM, Roland Dreier wrote:
On Tue, Jan 21, 2014 at 2:00 PM, Or Gerlitz <or.gerlitz@xxxxxxxxx> wrote:
Roland, ping! the signature patches were posted > three months ago. We
deserve a response from the maintainer that goes beyond "I need to
think on that".

Responsiveness was stated by Linus to be the #1 requirement from
kernel maintainers.

Hi Roland, I'll try to respond here.
removing LKML and adding Linux-scsi.

Or, I'm not sure what response you're after from me.  Linus has also
said that maintainers should say "no" a lot more
(http://lwn.net/Articles/571995/) so maybe you want me to say, "No, I
won't merge this patch set, since it adds a bunch of complexity to
support a feature no one really cares about."

1. I disagree about no-one cares about DIF/DIX. We are witnessing growing
interests in this especially for RDMA.
2. We put a lot of efforts to avoid complexity here and plug-in as 
simple as possible.
Application that will choose to use DIF will implement only 3 steps:
a. allocate signature enabled MR.
b. register signature enabled MR with DIF attributes (via post_send) and 
then do RDMA.
c. check MR status after transaction is completed (_lightweight_ verb 
that can be called from interrupt context).

   Is that it?  (And yes I
am skeptical about this stuff — I work at an enterprise storage
company and even here it's hard to find anyone who cares about
DIF/DIX, especially offload features that stop it from being
end-to-end)

1. RDMA verbs are _NOT_ stopping DIF from being end-to-end.
OS (or SCSI in our specific case) passes LLD 2 scatterlists: data 
{block1, block2, block3,...}, and protection {DIF1, DIF2, DIF3}.
LLD is required to verify the data integrity (block guards) and to 
interleave over the wire {block1, DIF1, block2, DIF2....}.
You must support that in HW, you rather iSER/SRP will use giant copy's 
to interleave by itself? or in case OS asked LLD
to INSERT DIF iSER/SRP will compute CRC for each data-block? RDMA 
storage ULPs are transports - they should have no business with
data processing.

2. HW DIF offload also gives you protection across the PCI. the 
data-validation is done (hopefully offloaded) also
when data+protection are written to the back-end device. end-to-end is 
preserved.

3. SAS & FC have T10-PI offload. This is just adding RDMA into the game.
With this set of verbs iSER, SRP, FCoE Initiators and targets will be able
to support T10-PI.

I'm sure you're not expecting me to say, "Sure, I'll merge it without
understanding the problem it's solving

Problem: T10-PI offload support for RDMA based initiators. Supporting 
end-to-end data integrity
while sustaining high RDMA performance.

  or how it's doing that,"

How it's doing that:
- We introduce a new type of memory region that posses protection 
attributes suited for data integrity offload.
- We Introduce a new fast registration method that can bind all the 
relevant info for verify/generate of protection information:
  * describe if/how to interleave data with protection.
  * describe what method of data integrity is used (DIF type X, CRC, 
XOR...) and the seeds that HW should start calculation from.
  * describe how to verify the data.
- We Introduce a new lightweight check of the data-integrity status to 
check if there were any integrity errors and get information on them.

Note: We made MR allocation routine generic enough to lay a framework to 
unite all MR allocation
methods (get_dma_mr, alloc_fast_reg_mr, reg_phys, reg_user_mr, fmrs, and 
probably more in the future...).
We defined ib_create_mr that can actually get mr_init_attr which can be 
easily extended as opposed to the specific calls exists today.
So I would say this even reduces complexity.

Hope this helps,

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html