On 04-Dec-18 14:04, Gal Pressman wrote: > Hello all, > The following patchset introduces the Elastic Fabric Adapter (EFA) driver, that > was pre-announced by Amazon [1]. > > EFA is a networking adapter designed to support user space network > communication, initially offered in the Amazon EC2 environment. First release > of EFA supports datagram send/receive operations and does not support > connection-oriented or read/write operations. > > EFA supports unreliable datagrams (UD) as well as a new unordered, scalable > reliable datagram protocol (SRD). SRD provides support for reliable datagrams > and more complete error handling than typical RD, but, unlike RD, it does not > support ordering nor segmentation. A new queue pair type, IB_QPT_SRD, is added > to expose this new queue pair type. > User verbs are supported via a dedicated userspace libfabric provider. > Kernel verbs and in-kernel services are initially not supported. > > EFA enabled EC2 instances have two different devices allocated, one for ENA > (netdev) and one for EFA, the two are separate pci devices with no in-kernel > communication between them. > > Thanks, > Gal > > [1] https://aws.amazon.com/about-aws/whats-new/2018/11/introducing-elastic-fabric-adapter/ > > Gal Pressman (13): > RDMA: Add EFA related definitions > RDMA/efa: Add EFA device definitions > RDMA/efa: Add the PCI device id definitions > RDMA/efa: Add the efa.h header file > RDMA/efa: Add the efa_com.h file > RDMA/efa: Add the com service API definitions > RDMA/efa: Add the ABI definitions > RDMA/efa: Implement functions that submit and complete admin commands > RDMA/efa: Add com command handlers > RDMA/efa: Add bitmap allocation service > RDMA/efa: Add EFA verbs implementation > RDMA/efa: Add the efa module > RDMA/efa: Add driver to Kconfig/Makefile > > MAINTAINERS | 8 + > drivers/infiniband/Kconfig | 2 + > drivers/infiniband/core/verbs.c | 2 + > drivers/infiniband/hw/Makefile | 1 + > drivers/infiniband/hw/efa/Kconfig | 14 + > drivers/infiniband/hw/efa/Makefile | 8 + > drivers/infiniband/hw/efa/efa.h | 191 +++ > drivers/infiniband/hw/efa/efa_admin_cmds_defs.h | 783 ++++++++++ > drivers/infiniband/hw/efa/efa_admin_defs.h | 135 ++ > drivers/infiniband/hw/efa/efa_bitmap.c | 76 + > drivers/infiniband/hw/efa/efa_com.c | 1122 ++++++++++++++ > drivers/infiniband/hw/efa/efa_com.h | 139 ++ > drivers/infiniband/hw/efa/efa_com_cmd.c | 544 +++++++ > drivers/infiniband/hw/efa/efa_com_cmd.h | 217 +++ > drivers/infiniband/hw/efa/efa_common_defs.h | 17 + > drivers/infiniband/hw/efa/efa_main.c | 669 +++++++++ > drivers/infiniband/hw/efa/efa_pci_id_tbl.h | 25 + > drivers/infiniband/hw/efa/efa_regs_defs.h | 117 ++ > drivers/infiniband/hw/efa/efa_verbs.c | 1827 +++++++++++++++++++++++ > include/rdma/ib_verbs.h | 9 +- > include/uapi/rdma/efa-abi.h | 89 ++ > 21 files changed, 5993 insertions(+), 2 deletions(-) > create mode 100644 drivers/infiniband/hw/efa/Kconfig > create mode 100644 drivers/infiniband/hw/efa/Makefile > create mode 100644 drivers/infiniband/hw/efa/efa.h > create mode 100644 drivers/infiniband/hw/efa/efa_admin_cmds_defs.h > create mode 100644 drivers/infiniband/hw/efa/efa_admin_defs.h > create mode 100644 drivers/infiniband/hw/efa/efa_bitmap.c > create mode 100644 drivers/infiniband/hw/efa/efa_com.c > create mode 100644 drivers/infiniband/hw/efa/efa_com.h > create mode 100644 drivers/infiniband/hw/efa/efa_com_cmd.c > create mode 100644 drivers/infiniband/hw/efa/efa_com_cmd.h > create mode 100644 drivers/infiniband/hw/efa/efa_common_defs.h > create mode 100644 drivers/infiniband/hw/efa/efa_main.c > create mode 100644 drivers/infiniband/hw/efa/efa_pci_id_tbl.h > create mode 100644 drivers/infiniband/hw/efa/efa_regs_defs.h > create mode 100644 drivers/infiniband/hw/efa/efa_verbs.c > create mode 100644 include/uapi/rdma/efa-abi.h > Hi Jason, It looks like the discussion didn't come to a conclusion, I'm trying to come up with a plan going forward and would like to get your opinion. I followed the comments and your concerns and I'll try to address them all: Let me start by making clear that EFA is not an infiniband device, nor it aspires to being one, but I think it does fit the verbs model. All technical comments will be fixed. We can implement an rdma-core (libibverbs) userspace provider with support for standard UD (including the 40 bytes offset) and SRD QPs through direct verbs. I'll also add documentation for SRD QP type, even if we end up using it as a driver QP type. The create/destroy AH issue will be solved with the sleepable flag, EFA can return -EOPNOTSUPP when called in an atomic context. When we'll add kernel verbs we can solve that the same way bnxt driver did (polling for completion). The EFA wire protocol is tightly coupled to the wire protocol for EC2’s VPC software defined network, which Amazon considers one of its proprietary differentiating features. We can’t share many of the details of the wire protocol as part of open sourcing the kernel driver, but are happy to share details on any customer-visible features, such as guarantees around our SRD protocol. Since EFA is not designed to be used independently of EC2’s VPC data plane, we don’t believe the lack of a well-documented wire protocol impacts customers in any meaningful way. Kernel verbs are not supported right now, but we do have future plans to support that. I know future plans is probably not something you care for, and I can't give it a time frame right now - but it's not overlooked. We are driven by our customers, and they have shown interest in this. We are focused on our customer base, and due to our product offering we haven't seen customer demand for nvmeof support, which you have set as the bar for the RDMA subsystem. I'd really like to avoid implementing things that do not interest our customers and will not have actual use. I genuinely believe that EFA belongs in the RDMA subsystem, a lot more than vfio/anywhere else. We enforce PDs and MRs in the device, use standard AH registration, remote QP numbers addressing, packet headers are constructed on the device, etc.. We are not simply hacking our way to userspace through the subsystem. We can implement the driver in a different subsystem, but I truly believe that no one will benefit from that. Thanks, Gal