On 7/5/2018 10:51 AM, Christopher Lameter wrote:
On Mon, 2 Jul 2018, Dennis Dalessandro wrote:
Omni-Path TID RDMA Feature
Intel Omni-Path (OPA) TID RDMA support is a feature that accelerates data
movement between two OPA nodes through the IB Verbs interface. It improves
RDMA READ/WRITE performance by delivering the data payload to a user
buffer directly without any software copying.
Well that is what RDMA already does and that is the reason RDMA
technology was implemented.
What does TID do? Searched for information about TID and could not find
much aside from vague statements in Intel manuals.
Basically we have these KDETH packets (previously used by PSM only).
There is a TID index value which maps where to put data at so the HW can
stick it there.
Architecture
=============
The TID RDMA protocol is implemented on the hfi1 driver level and is
therefore transparent to the ULPs. It is designed to facilitate the data
transactions for two specific RDMA requests:
- RDMA READ;
- RDMA WRITE.
Previously, when a verbs data packet is received at the destination (requester
side for RDMA READ and responder side for RDMA WRITE), the data payload
is copied to the user buffer by software, which slows down the performance
significantly for large requests.
The RDMA technology that we do have here definitely does not use software
to copy the data.
Are we talking about a driver that falls back to software handling that
you are trying to fix?
hfi1 mostly implements verbs in software. Same as qib. Which is why we
have rdmavt. To pull the common parts of software verbs implementation
together.
For TID RDMA requests, hardware resources (hardware flow and TID entries)
are allocated on the destination side (the requester side for TID RDMA
READ and the responder side for TID RDMA WRITE). The information for
these resources is conveyed to the data source side (the responder side
for TID RDMA READ and the requester side for TID RDMA WRITE) and embedded
in data packets. When data packets are received by the destination,
hardware will deliver the data payload to the destination buffer without
involving software and therefore improve the performance.
Well you register RDMA memory and thus reserve the resources and you may
call that allocation of resources too...
What in the world is this? Reimplementing RDMA on top of RDMA?
No. The simplest way to put it is this is using the HW features of the
hfi chip to do RDMA for verbs where it was previously done in software.
We are letting verbs use HW offload features like PSM does, but doing it
under the covers where it is invisible to the user/verbs-uapi.
-Denny
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html