RE: [PATCH rdma-next 0/3] Support out of order data placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx]
> Sent: Tuesday, August 01, 2017 6:38 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: Tom Talpey <tom@xxxxxxxxxx>; Bart Van Assche
> <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx; dledford@xxxxxxxxxx;
> linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein <idanb@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement
> 
> On Tue, Aug 01, 2017 at 10:06:14PM +0000, Parav Pandit wrote:
> 
> > > > >  Initial Condition VA=0 Data = 0  RDMA-W VA=0 Data=1  RDMA-R
> > > > > VA=0
> > > > >
> > > > > Spec says 1 must be returned, but sounds like this relaxed
> > > > > version could
> > > return 0.
> > >
> > > > No. Table 76 stays as is as described before.
> > >
> > > How is this possible?
> 
> > I am not sure what more can I explain you Jason.
> > Requester side HCA follows HCA Table-76.
> > Incoming read responses are not processed until previous writes are
> > ACKed implicitly (in read responses) or explicitly by ACK packets.
> > Same as before described in spec. No extra description needed for this
> > patchset.
> 
> But doing that pretty much destroys much of the entire point of having a relaxed
> ordering :P
>
Probably not. I can understand that having that would be possibly ultimate thing.
I like to add that whenever its available too.
Some of the applications are heavy write or read driven instead of mix operations - those benefit from this attribute.
 
> > > > >  RDMA-W VA=0 Data=1
> > > > >  RDMA-W VA=0 Data=2
> > > > >  SEND
> > > > >
> > > > > Sounds like with the relaxed version the app could see 1 at SEND CQ time.
> > > > >
> > > > > So RDMA-W -> RDMA-W degrades to a F
> > >
> > > > No. Table-76 is based on  how requester sees the execution.
> > > > So it stays as '#'.
> > >
> > > How is this possible?
> > >
> > Please don't mix requester side ordering with responder side execution.
> > C9-28 on responder side is relaxed - as explained few times before.
> 
> No, I see what you are tring to say now. 
Great.

> I disagree with this. Table-76 and C9-28
> are describing the same thing, you cannot weaken C9-28 without also restating
> Table 76.
>
The intent is to not extend the definition of fence bit beyond RDMA reads here.
What you are asking is when ooo attribute is set, and if user still wants to do in-order RDMA Writes for W1 and W2, fence bit should be extended for it.
There can be very few use cases where certain operations needs to follow ordering and certain don't in a single QP.
User is rather better off not setting this attribute on a QP when it needs W1, W2 ordering.

> Table 76 is clearly talking about the entire system, including the execution and
> memory subsystem of the completer.
> 
> > It covers only requester side.
> > Send with invalidate execution on responder side is described in
> > 9.4.1.1.1
> 
> I suppose 9.4.1.1.1 point #1 already allows the out of order behavior.
>
It allows because to indicates below.
(a) Paragraph above #1. Snippet below.
"Since the invalidation operation is not executed
by the transport layer, the Invalidate operation may take place either
before or after the transport-level acknowledge has been generated"

So it can still send out the ACK while invalidation in progress (not yet started).
While that is in progress new operation can still target invalidated region.
Now depending on how slow invalidation is going, subsequent operation DMA also.
Most good adapters won't send out ACK before invalidating to my knowledge even though spec allows it.
Because doing so is very hard to debug and keeps hole open for accidental memory corruption.

Also 9.4.1.1.1 is for subsequent operations where Send_Inv is second operation.

The example of 
RDMA-W1, RDMA-W2, Send_Invalidate actually follows section 9.4.1.1.1 point #2.

 > > You are proposing a different behavior and attribute which may be done
> > for a HCA that support such thing.  Please submit a different patch
> > for it whenever its appropriate.  Current query HCA attribute is bit
> > field for future relaxation. May be what you described can be done.
> 
> you do have to define them in a sensible and usefully broad way using the
> community process.
>
Sure. That's why we are discussing here.
 
> > Other out of order atomics such as
> > Atomic->Read
> > Write->Atomic may be done in future under different attribute.
> 
> I think that is a mistake, you should start with them being out of order and
> require the app to fence to bring order back, even if the current HCAs execute
> them in order anyhow.
> 
I would agree for Atomic->read case because it's very similar to READ->Read.
Write->Atomic goes back to first point as atomic completions can trigger implicit Write completions.
So Write->atomic I will keep out of this attribute.
Let me check with Idan for applying fence bit on Atomic->read, what he thinks about it.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux