RE: [PATCH rdma-next 0/3] Support out of order data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, August 01, 2017 2:01 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: Tom Talpey <tom@xxxxxxxxxx>; Bart Van Assche
> <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx; dledford@xxxxxxxxxx;
> linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein <idanb@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement
> 
> On Tue, Aug 01, 2017 at 06:14:08PM +0000, Parav Pandit wrote:
> 
> > >  Initial Condition VA=0 Data = 0
> > >  RDMA-W VA=0 Data=1
> > >  RDMA-R VA=0
> > >
> > > Spec says 1 must be returned, but sounds like this relaxed version could
> return 0.
> 
> > No. Table 76 stays as is as described before.
> 
> How is this possible?
I am not sure what more can I explain you Jason.
Requester side HCA follows HCA Table-76.
Incoming read responses are not processed until previous writes are ACKed implicitly (in read responses) or explicitly by ACK packets.
Same as before described in spec. No extra description needed for this patchset.

> 
> > >  RDMA-W VA=0 Data=1
> > >  RDMA-W VA=0 Data=2
> > >  SEND
> > >
> > > Sounds like with the relaxed version the app could see 1 at SEND CQ time.
> > >
> > > So RDMA-W -> RDMA-W degrades to a F
> 
> > No. Table-76 is based on  how requester sees the execution.
> > So it stays as '#'.
> 
> How is this possible?
> 
Please don't mix requester side ordering with responder side execution.
C9-28 on responder side is relaxed - as explained few times before.

> You've clearly stated this feature allows out of order execution across packet
> boundaries, there is no way to know at the responder what the missed packets
> where, so ineventiably, both of these cases must be possible. Or you are wrong
> about the statement on out of order.
Two examples already explained about out-of-order execution.
To me it appears that you are confused with requester vs responder side execution.
What more can I explain other than repeating 
(a) Table 76 on requester side stays as is and
(b) C9-28 is relaxed on responder side.

> > > However, SEND WITH INVALIDATE is a special cases that impacts the
> > > processing of work itself, not just the CPU observation, which is a
> > > bit outside what table 76 is talking about.
> 
> > SEND, SEND WITH IMM, SEND WITH INVALIDATE falls in same category as
> send as first column in Table76.
> 
> Not really, I don't think you understand how this all fits together..
> 

> > > I'd advocate for allowing this to be out of order (but documented as
> > > such), as impliclty fencing SEND WITH INVALIDATE is not acceptable
> > > for performance and
> 
> > It is as per first column of Table-76.
> 
> Dn't understand you remark, it is clearly ordered..
> 
I was saying, that there is no change on send ordering at requester side.

> > > most workloads using that feature do not care about this strict ordering.
> > >
> > nvme fabrics do care.
> > nvme fabrics target does RDMA-W, RDMA_S_INV sequence on the same
> memory key that is being used in RDMA-W without waiting for RDMA-W
> completion for good reason.
> > I recall SMB doing the same as well.
> > RDMA-S_INV after RDMA-W cannot break the order.
> 
> They don't care, because RDMA_WRITE, SEND_WITH_INVALIDATE on the same
> rkey does not try to write to the rkey memory twice, which is the only case
> where adding out of order execution really matters.
> 
It doesn't have to be an overlapping write to same rkey.
One block IO can translate to multiple RDMA-W from the target side, potentially to same rkey.
One possibility is target code ran out of number of local sges or had fragmented memory.
So 
RDMA-W1 (key=A, VA=0x1000, len=16K with 4 SGEs)
RDMA-W2 (key=A, VA=0x5000, len=16K, with 4 SGEs)
Send(Invalidate_key=A)
You do not want SEND_INVALIDATE to stop DMA of RDMA-W1 at later point.
So Send CQE cannot reach before previous RDMA-w1 and w2 are completed.

> They only care that the invalidate guarentees no DMA is possible once it reaches
> the receiver's CQ.
> 
> > > The requirement is really that by the time the SEND RCQ is seen that
> > > the INVALIDATE has taken effect.
> > >
> > Current Table-76 requirement already relaxes for
> > RDMA-R-> RDMA_S_INV.
> > However most users won't do above sequence because users would not like to
> fail duplicate read requests.
> > So let's continue with Table-76 for SEND as 2nd operation as defined
> > today. (first column stays as is)
> 
> As I said, table 76 does not really capture the full behavior of INVALIDATE.
> 
It covers only requester side.
Send with invalidate execution on responder side is described in 9.4.1.1.1

> The spec requires WRITE,INV,WRITE to fail, but it would be just a fine for
> storage protocols if WRITE,INV,WRITE could succeed, so long as delivering the
> INV to the CQ fences the DMA, which can be done in a high performance way.
> Fencing the WRITE,INV,WRITE can not be done with high performance.
> 
You are proposing a different behavior and attribute which may be done for a HCA that support such thing.
Please submit a different patch for it whenever its appropriate.
Current query HCA attribute is bit field for future relaxation. May be what you described can be done.

> > > Atomic are basically similar, sounds like Atomic Op -> RDMA Read
> > > should degrade to a F as well. I'd say that is desired as well.
> >
> > No. Table-76 stays as is.
> > Atomic->Atomic is already 'F'.
> > Atomic->RDMA_R is continues as '#'. (Similar to RDMA_W->RDMA_R).
> 
> Same argument as above, many apps will tolerate out of order for atomics, thee
> default for an out of order mode should be to allow it, and let apps request in
> order with fence.
>
Following are already 'F'.
Atomic->Atomic
Atomic->Write 
Read->Atomic

Other out of order atomics such as
Atomic->Read
Write->Atomic may be done in future under different attribute.

Jason,
Once single attribute is not solution to all out-of-order needs.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux