[PATCH RFC 00/10] IB/core: SG IOCTL based RDMA ABI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The ideas presented here are based on our previous series in addition to some
ideas presented in OFVWG, Sean's series, Linux Plumbers 2017 discussions and
other discussions held in Openfabrics Alliance 2017 conference.

This patch series adds ioctl() interface to the existing write() interface and
provide an easy route to backport this change to legacy supported systems.
Analyzing the current uverbs role in dispatching and parsing commands, we find
that:
(a) uverbs validates the basic properties of the command.
(b) uverbs is responsible of doing all the IDR and uobject management and
    locking. It's also responsible for handling completion FDs.
(c) uverbs transforms the user<-->kernel ABI to kernel API.

(a) and (b) are valid for every kABI. Although the nature of commands could
change, they still have to be validated and transform to kernel pointers.
In order to avoid duplications between the various drivers, we would like to
keep (a) and (b) as shared code.

In addition, this is a good time to expand the ABI to be more scalable, so we
added a few goals:
(1) Command's attributes shall be extensible in an easy one. Either by allowing
    drivers to have their own extensible set of attributes or core code
    extensible attributes.
(2) Each driver may have specific type system (i.e QP, CQ, ....). It could extend
    this type system in the future. Try to avoid duplicating existing types or
    actions.

Thus, in order to allow this flexibility, we decide giving (a) and (b) as a
common infrastructure, but use per-driver guidelines in order to do that
parsing and uobject management. Handlers are also set by the drivers
themselves (though they can point to either shared common code) or
driver specific code.

We introduce a hierarchal object-method-attributes structure. Adding an
entity to this hierarchy doesn't affect the rest of the interface.
Such a hierarchy could be rooted in a specific device and describes both the
common features and features which are unique to this specific device.
This hierarchy is actually a per-device parsing tree, composed of three
layers - objects, actions and attributes. Each such layer contains two
groups - common entities and hardware specific entities. This way, a
device could add hardware specific actions to a common object, it could
add hardware specific objects, etc. Abstractions which really make sense,
should go to the common section. This means that we still need to be able to
pass optional parameters. In order to enable optional parameters, each command
is composed of a header and a bunch of TLVs to pass the attributes of this
command. The supported attribute classes are:
* PTR_IN (command) [in case of a small buffer, we could pass the data inlined]
* PTR_OUT (response)
* IDR_OBJECT
* FD_OBJECT
We differentiate between blobs and objects in order to allow a generic piece of
code in the kernel to do some syntactic validations and translate the given
user object id to a kernel structure. This could really help in sharing code
between different handlers.

Scatter gather was chosen in order to allow us not to recompile user space
drivers. By using pointers to driver specific data, we could just use it
without introduce copying data and without changing the user-space driver at
all.

We elevate the locking and IDR changes accepted to linux-rdma in this series.
Since types are no longer enforced by the common infrastructure, there is no
point of pre-allocating common IDR types in the common code. Instead, we
provide an API for driver to add new types. We use one IDR per context
for all its IDR types. The driver declared all its supported types, their
free function and release order. After that, all uboject, exclusive access
and types are handled automatically for the driver by the infrastructure.

When putting the pieces together, we have per-device parsing tree, that actually
describes all the objects, actions and attributes a device supports by using a
descriptive language. A command is given by the user-space, as a header plus an
array of Type-Length-Pointer/Object attributes. The ioctl callback executes a
generic code that shares as much logic between the various verbs handlers as
possible. This generic code gets the command input from the user-space and by
reading the device's parsing tree, it could syntactically validate it, grab all
required objects, lock them, call the right handler and then
commit/unlock/rollback the result, depending on the handler's result. Having
such a flexible extensible mechanism, that allows introducing new common and
hardware-specific to existing common attributes, but also allows adding new
hardware-specific entities, enhances the support for device diversity quite
vastly.

This series lays the foundations of such an infrastructure. It demonstrate a few
verbs handlers that use this new infrastructure for current features. We don't
demonstrate how to add device specific features, but it's fairly simple - just
introduce a device specific root, re-use all sub-trees you need and add/replace
whatever required for your device (this could be later enhanced by a introducing
a dynamic parse-tree merge).

Future work should treat other uverbs related subsystem (such as RDMA-CM)
similarly. When implementing this infrastructure for RDMA-CM, we may need to
replace ib_device with an ioctl_device and ib_ucontext with ioctl_context.

Another future enhancement is to use the parse tree in order to introduce an
enhanced query mechanism. Instead of having a bit for every new feature, we
could allow the user-space to read the parse tree and query if
types/actions/attributes are actually supported by this particular device.

Regards,
Matan

Matan Barak (10):
  IB/core: Add a generic way to execute an operation on a uobject
  IB/core: Add support to finalize objects in one transaction
  IB/core: Add new ioctl interface
  IB/core: Declare a type instead of declaring only type attributes
  IB/core: Add DEVICE type and root types structure
  IB/core: Initialize uverbs types specification
  IB/core: Add macros for declaring actions and attributes
  IB/core: Add ability to explicitly destroy an uobject
  IB/core: Add uverbs types, actions, handlers and attributes
  IB/core: Expose ioctl interface through experimental Kconfig

 drivers/infiniband/Kconfig                   |    7 +
 drivers/infiniband/core/Makefile             |    2 +-
 drivers/infiniband/core/core_priv.h          |   14 +
 drivers/infiniband/core/rdma_core.c          |  174 +++++
 drivers/infiniband/core/rdma_core.h          |   39 +
 drivers/infiniband/core/uverbs.h             |    9 +
 drivers/infiniband/core/uverbs_cmd.c         |   21 +-
 drivers/infiniband/core/uverbs_ioctl.c       |  409 ++++++++++
 drivers/infiniband/core/uverbs_main.c        |    9 +
 drivers/infiniband/core/uverbs_std_types.c   | 1076 ++++++++++++++++++++++++--
 drivers/infiniband/hw/cxgb3/iwch_provider.c  |    5 +
 drivers/infiniband/hw/cxgb4/provider.c       |    5 +
 drivers/infiniband/hw/hns/hns_roce_main.c    |    5 +
 drivers/infiniband/hw/i40iw/i40iw_verbs.c    |    5 +
 drivers/infiniband/hw/mlx4/main.c            |    5 +
 drivers/infiniband/hw/mlx5/main.c            |    5 +
 drivers/infiniband/hw/mthca/mthca_provider.c |    5 +
 drivers/infiniband/hw/nes/nes_verbs.c        |    5 +
 drivers/infiniband/hw/ocrdma/ocrdma_main.c   |    5 +
 drivers/infiniband/hw/usnic/usnic_ib_main.c  |    5 +
 include/rdma/ib_verbs.h                      |    2 +
 include/rdma/uverbs_ioctl.h                  |  289 +++++++
 include/rdma/uverbs_std_types.h              |  212 ++++-
 include/rdma/uverbs_types.h                  |   39 +-
 include/uapi/rdma/ib_user_verbs.h            |   40 +
 include/uapi/rdma/rdma_user_ioctl.h          |   25 +
 26 files changed, 2315 insertions(+), 102 deletions(-)
 create mode 100644 drivers/infiniband/core/uverbs_ioctl.c
 create mode 100644 include/rdma/uverbs_ioctl.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux