[PATCH rdma-next 00/12] GID reference counting series

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Leon Romanovsky <leonro@xxxxxxxxxxxx>

>From Parav:

This series implements reference counts for the GID table entries.

GID table entries for RoCE are based on IP addresses of netdevices and
default GID entries are based on GUID.

For IB link layer they are based on SM and provided by the HCA to the
stack. These entries can get added/deleted/replaced at anytime in a host.
It is desirable to not delete the GID table entry while it is in use in
various software flows such as CM message processing, resolving destination
MAC and RDMACM message processing to name a few.

It is also desirable to not replace a GID table entry while previous
GID entry is still being used in a software flow. In a GID table,
GID entries for RoCE can belong to multiple network namespaces.
Without holding a reference to it, a software flow might process a
CM message in two different network namespaces.

This series is a prerequisite for enabling namespaces for RoCE. It ensures
that while GID entry being used, it cannot get deleted. When the last user
of the GID entry releases the reference, the GID entry is released in the
software cache and in the HCA.

GID entries are maintained by ib_core in a software cache for IB,
RoCE and iWarp link layers. All GID search routines now refer to GID software
cache. This brings consistency in ULP modules and CM modules to have consistent
view of GID entries. Currently RoCE GID table entries are reference counted
using kref. IB GID table entries are not yet reference counted, however,
the software interface is common to both the link layers. GID entry
get/put/hold APIs for IB are no-op.

Series consist of patches for following functionalities.
1. Refactor code for cache handling of the RoCE GID entries
2. New APIs for GID get/put/hold operations
3. Changing existing APIs and its users to follow new API signature

Series also removes dependency of net+ifindex in the path record entry
for RoCE. Both of those fields of the netdev can change and storing net
namespace without holding reference will lead to use-after-free crash.
Therefore it is removed. Netdevice information for RoCE will be provided
via referenced GID attribute in ib_cm_req entry in future.

Summary of APIs:
1. Caller who just wish to query the GID based on device, port and index
should call rdma_query_gid().

2. Caller who wish to seach and use the GID and/or use GID attributes for
longer period of time in a software flow, should use rdma_find_* APIs.

3. Caller who knows the GID index and want to refer to the GID and/or
GID attributes should call rdma_get_gid_attr() and must call rdma_put_gid_attr().

4. Caller with existing GID reference pointer should call rdma_hold_gid_attr()
if it wish to increment reference to the GID attribute.

5. Cached prefix is dropped from most GID related APIs as all the APIs provide
information from the cache. IB prefix is replaced with rdma to match to similar new
and existing APIs.

Thanks

Parav Pandit (12):
  IB/cm: Avoid AV ah_attr overwriting during LAP message handling
  IB/cm: Store and restore ah_attr during LAP msg processing
  IB/cm: Store and restore ah_attr during CM message processing
  IB/core: Refactor gid management code to store state and prop
    separately
  IB/core: Introduce GID entry reference counts for RoCE
  IB/core: Introduce GID attribute get, put and hold APIs
  RDMA: Use ib_gid_attr in query attributes
  IB/core: Use GID get and put reference APIs
  net/smc: Use GID get and put reference APIs
  IB: Simplify ib_query_gid and its users to drop gid attribute
  RDMA: Initialize and use sgid attribute from ah_attr
  RDMA: Use rdma_query_gid() instead of ib_get_cached_gid()

 drivers/infiniband/core/cache.c           | 490 +++++++++++++++++++++---------
 drivers/infiniband/core/cm.c              | 183 +++++++----
 drivers/infiniband/core/cma.c             |  86 ++++--
 drivers/infiniband/core/device.c          |  21 +-
 drivers/infiniband/core/mad.c             |   4 +-
 drivers/infiniband/core/multicast.c       |  19 +-
 drivers/infiniband/core/sa_query.c        |  78 +++--
 drivers/infiniband/core/sysfs.c           |  48 +--
 drivers/infiniband/core/user_mad.c        |   1 +
 drivers/infiniband/core/uverbs_cmd.c      |   4 +-
 drivers/infiniband/core/uverbs_marshall.c |   2 -
 drivers/infiniband/core/verbs.c           | 137 ++++++---
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  |  83 +++--
 drivers/infiniband/hw/hns/hns_roce_ah.c   |  14 +-
 drivers/infiniband/hw/mlx4/ah.c           |  16 +-
 drivers/infiniband/hw/mlx4/main.c         |  15 +-
 drivers/infiniband/hw/mlx4/qp.c           |  32 +-
 drivers/infiniband/hw/mlx5/ah.c           |  11 +-
 drivers/infiniband/hw/mlx5/main.c         |  32 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h      |   6 +-
 drivers/infiniband/hw/mlx5/qp.c           |  10 +-
 drivers/infiniband/hw/mthca/mthca_av.c    |  12 +-
 drivers/infiniband/hw/ocrdma/ocrdma_ah.c  |  17 +-
 drivers/infiniband/hw/ocrdma/ocrdma_hw.c  |  19 +-
 drivers/infiniband/hw/qedr/qedr_roce_cm.c |  14 +-
 drivers/infiniband/hw/qedr/verbs.c        |  16 +-
 drivers/infiniband/sw/rxe/rxe_av.c        |   3 +-
 drivers/infiniband/sw/rxe/rxe_loc.h       |   1 -
 drivers/infiniband/sw/rxe/rxe_net.c       |  51 ++--
 drivers/infiniband/sw/rxe/rxe_qp.c        |  26 +-
 drivers/infiniband/sw/rxe/rxe_recv.c      |  12 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c     |  11 +-
 drivers/infiniband/ulp/ipoib/ipoib_ib.c   |   3 +-
 drivers/infiniband/ulp/ipoib/ipoib_main.c |   8 +-
 drivers/infiniband/ulp/srp/ib_srp.c       |   2 +-
 drivers/infiniband/ulp/srpt/ib_srpt.c     |   3 +-
 include/rdma/ib_addr.h                    |   2 +
 include/rdma/ib_cache.h                   |  67 ++--
 include/rdma/ib_cm.h                      |   3 +
 include/rdma/ib_sa.h                      |  49 +--
 include/rdma/ib_verbs.h                   |  38 ++-
 net/smc/smc_core.c                        |  18 +-
 net/smc/smc_ib.c                          |  25 +-
 43 files changed, 983 insertions(+), 709 deletions(-)

--
2.14.3

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux