[PATCH V3 for-next 0/7] Change IDR usage and locking in uverbs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Doug,

This series is the first part of introducing the new ABI.
It changes the IDR and locking mechanism in ib_uverbs.
This will allow handling all types in the same way, both IB/core
standard types and driver specific types. In that case, type is
referred to either IDR (such as QP, CQ, etc.) or FD such as
completion channel based uobjects.

Previously, we had global IDR tables per type and a per-type list on
the user context, which implied that the type system was fixed.
This patch series changes that into a uverbs-file IDR table (since
in the new proposed ABI each device could have a different type
system and a uverbs-file refers for a specific device) and a unified
per-context objects list. Objects in this list could either be IDR
based objects or FD based objects.

A type is represented by an identifier, an allocation size, a free
function which is used in the context tear down or object removal
and a release order. The allocation function is actually the
handler (for example, create_qp). The release order determines the
relations between objects. Some relations are set in the hardware
or user-space. For example, a MW could be created before MR, but
if this MW is bounded to the MR, the MW should be released before the
MR. Later on, we'll add actions in an object-oriented programming
manner to each type. So, a type contains all required information for
allocating its size and freeing it.

Since after this change a type contains all the information required
for freeing it, we refactor all the destroy handlers to just use our
infrastructure to destroy the type. The reason for the destruction
is passed to the destroy function.

The locking mechanism is changed as well. Previously, the uverbs_cmd
handlers created new objects themselves and dealt with the internals
of locking/unlocking them. This is now moved to a separate code which
either creates a new object, destroys an object or locks it for
read/write. This is possible since we have a descriptive type system.
Hence, we could have a shared code to create, lock and destroy types.

In contrast to the previous locking approach, we don't block the
user-space thread if an object is already locked, but we just return
-EBUSY and expect the user to handle this. In order to maintain
backward compatibility, we've added explicit locks to the uverbs_cmd
handlers (in non-racy scenarios of course), thus for sane flows the
behaviour should be the same as previous one.
The incentive here is to provide a robust infrastructure to add new
actions that can't create a dead-lock (e.g. two commands that try to
lock objects in AB-BA manner).
In addition, since objects creation and locking is dealt in a
centralized place, the new proposed ABI won't need to deal with it
explicitly in its handlers.

A typical flow of a handler will be:
1. Serialize the user-space command to kernel-space structures.
2. Get all objects the handle needs (through the new centralized
     mechanism).
3. Do the actual command.
4. Commit or abort objects allocation/fetching/destruction.
5. Write response to the user.

We use the following locks/krefs:
1. ib_uobject: usecnt (atomic)
	o Protects from concurrent read and write/destroy
	o The possible values are
		- (0):   Unlocked
		- (-1):  Locked for writing
		- (> 0): Locked for reading. The exact value is number
			 of readers.
2. ib_ucontext: uobjects_lock
	o Protect from concurrently modifying the context uobject's list
3. ib_uobject: ref [kref]
	o Let's a handler delay the memory release of an uobject even if
	  it was already removed from the ucontext's objects list
	  (e.g. CQ takes a reference count on the completion handler)
	o rdma_core takes one reference count for all objects in its
	  repository.
4. ib_uverbs_file: cleanup_mutex [existing lock]
	o Protects concurrently closing a file from releasing the ucontext.
	  Since when removing a uobject, we need to remove it from the
	  uobjects list and thus we use the ucontext lock (clause 2),
	  we need to make sure the ucontext is alive throughout this process.
5. ib_ucontext: cleanup_rwswm
	o Protects cleanup_context from concurrent lookup_get, remove_commit
	  and alloc_commit.
	o If alloc_commit is running during context cleanup, we remove the
	  object with a special reason.
	o It makes rdma_core more self contained and less dependant on
	  external locks.
6. ib_uobject_file: idr_lock
	o Protects concurrent write access to the idr.
	o RCU protects idr read access.
7. File reference count:
	o Protects race between lookup_get on a fd uobject and a release of
	  such an object.
	o If we get the file, we increased the reference on this file and we
	  won't get to the .release file operation.
	o Otherwise, the .release file operation is already called (or would
	  be called soon) and then the fd was already removed from the files
	  table of the process.

This patch-set is applied on top of Doug's k.o/for-4.12 branch.

Regards,
Matan

Changes from V2:
1. Rebased on top of Doug's latest branch
    a. Moved the cgroups code into the uobject infrastructure
2. Cleaned up rdma_lookup_get_uobject a bit

Changes from V1 - address Jason's comments:
1. Added ucontext->cleanup_rwsem to make the locking more self contained.
2. When calling a destroy handler, the object's free function is called
   with the appropriate reason.
3. Bug fix of double free when uobject file allocation failed.
4. Wrap the type->ops with appropriate inline function and macros
5. Reduce amount of macros.
6. Add lockdep style debugging and some other safety checks.
7. Replace a release function with needs_rcu bool flag.
8. Rebase the patches.

Changes from V0 - address some of Jason's comments:
1. Change idr table to be per uverbs_file and not per device.
2. Make changes more gradual
    a. Instead of doing a flags day approach, first introduce the idr
       uobjects with a new API and change their usages:
	 o Ditch the live flag
	 o Manage all idr objects through rdma_core
	 o create and cleanup context through rdma_core
    b. Add a lock to preserve old attch_mcast/detach_mcast behaviour.
    c. Add fd based objects infrastructure
    d. Change current completion channel to use this fd infrastructure.
3. Ditch the switch-case enums and favor a more OOP approach through a
   vtable.
4. Instead of having a kref-ed lock, put a new lock on the uverbs_file
   to synchronize fd objects deletion from context removal
    a. We favored this approach from keeping the ucontext alive, as the
       ucontext is allocated by the driver and thus we want to deallocate
       it through the same entity. We don't want to defer the whole
       deallocation process as in the current state we could rmmod the
       drivers code when applications are still running and trigger
       a disassociate flow.
5. Reduce the amount of macros - use only a macro for declaring idr
   types and a macro to declare fd types.
6. Use kref to manage all uobjects.
7. Use proper types for idr.

Matan Barak (7):
  IB/core: Refactor idr to be per uverbs_file
  IB/core: Add support for idr types
  IB/core: Add idr based standard types
  IB/core: Change idr objects to use the new schema
  IB/core: Add lock to multicast handlers
  IB/core: Add support for fd objects
  IB/core: Change completion channel to use the reworked objects schema

 drivers/infiniband/core/Makefile           |    3 +-
 drivers/infiniband/core/rdma_core.c        |  625 +++++++++++++
 drivers/infiniband/core/rdma_core.h        |   78 ++
 drivers/infiniband/core/uverbs.h           |   53 +-
 drivers/infiniband/core/uverbs_cmd.c       | 1388 ++++++++--------------------
 drivers/infiniband/core/uverbs_main.c      |  450 ++++-----
 drivers/infiniband/core/uverbs_std_types.c |  275 ++++++
 include/rdma/ib_verbs.h                    |   41 +-
 include/rdma/uverbs_std_types.h            |  114 +++
 include/rdma/uverbs_types.h                |  171 ++++
 10 files changed, 1863 insertions(+), 1335 deletions(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h
 create mode 100644 drivers/infiniband/core/uverbs_std_types.c
 create mode 100644 include/rdma/uverbs_std_types.h
 create mode 100644 include/rdma/uverbs_types.h

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux