Hi Jason, > -----Original Message----- > From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx] > Sent: Saturday, July 22, 2017 4:27 PM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: Tom Talpey <tom@xxxxxxxxxx>; Bart Van Assche > <Bart.VanAssche@xxxxxxxxxxx>; leon@xxxxxxxxxx; dledford@xxxxxxxxxx; > linux-rdma@xxxxxxxxxxxxxxx; Idan Burstein <idanb@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 0/3] Support out of order data placement > > On Sat, Jul 22, 2017 at 05:32:05PM +0000, Parav Pandit wrote: > > > > > > Eg > > > > > > RDMA-W VA=0 Data=1 > > > RDMA-R VA=0 > > > RDMA-W VA=0 Data=2 > > > > > > What does the read return? Spec says 1, but it sounds like this > > > relaxed ordering could return 2. > > > > > Spec says Data=1 on RDMA-R only if Fence is set on read operation in Table- > 76. > > Otherwise duplicate read request after executing RDMA-W2 of Data=2 can > return Data=2 on read request. > > Erm, I should have written it like this > > Initial Condition VA=0 Data = 0 > RDMA-W VA=0 Data=1 > RDMA-R VA=0 > > Spec says 1 must be returned, but sounds like this relaxed version could return 0. No. Table 76 stays as is as described before. > So RDMA Write -> RDMA Read degrades to a F > > Similarly, > > RDMA-W VA=0 Data=1 > RDMA-W VA=0 Data=2 > SEND > > Sounds like with the relaxed version the app could see 1 at SEND CQ time. > > So RDMA-W -> RDMA-W degrades to a F No. Table-76 is based on how requester sees the execution. So it stays as '#'. > > > > Whta about > > > > > > RDMA-W VA=0 Data=1 > > > SEND WITH INVALIDATE VA=0 > > > RDMA-W VA=0 Data=2 > > > > > > Spec says the second RDMA-W must fail, > > Right. > > > > > but it sounds like this relaxed ordering would allow it to happen. > > > No. Table 76 is followed in this case. > > 1st operation Write. > > 2nd operation send. > > There is implicit fence defined by '#' in Table, which is followed. > > So 2nd RDMA-W continue to fail. > > So, I expect what is happening here is that the SEND RCQ is delayed until the > sequence numbers catch up, eg guarenteeing that all packets prior to the SEND > have been seen and committed to memory. > Which is what table 76 is primarily talking about. > > However, SEND WITH INVALIDATE is a special cases that impacts the processing > of work itself, not just the CPU observation, which is a bit outside what table 76 > is talking about. > SEND, SEND WITH IMM, SEND WITH INVALIDATE falls in same category as send as first column in Table76. > I'd advocate for allowing this to be out of order (but documented as such), as > impliclty fencing SEND WITH INVALIDATE is not acceptable for performance and It is as per first column of Table-76. > most workloads using that feature do not care about this strict ordering. > nvme fabrics do care. nvme fabrics target does RDMA-W, RDMA_S_INV sequence on the same memory key that is being used in RDMA-W without waiting for RDMA-W completion for good reason. I recall SMB doing the same as well. RDMA-S_INV after RDMA-W cannot break the order. > The requirement is really that by the time the SEND RCQ is seen that the > INVALIDATE has taken effect. > Current Table-76 requirement already relaxes for RDMA-R-> RDMA_S_INV. However most users won't do above sequence because users would not like to fail duplicate read requests. So let's continue with Table-76 for SEND as 2nd operation as defined today. (first column stays as is) > Atomic are basically similar, sounds like Atomic Op -> RDMA Read should > degrade to a F as well. I'd say that is desired as well. No. Table-76 stays as is. Atomic->Atomic is already 'F'. Atomic->RDMA_R is continues as '#'. (Similar to RDMA_W->RDMA_R). > > I completely understand and agree that storage protocols who depend on > > RDMA-CM would like to have this. But again, unavailability of this > > bit in RDMA-CM is not a blocker for user land apps and let them start > > using it. Once RDMA-CM has it, storage kernel ULPs can also be > > enabled. > > If there is no path to get this into the RDMA CM then it is just another vendor > feature and it does not belong in the common API. RDMA CM is not the only connection manager is use today. Once RDMA CM has it, it can be extended there as well. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html