From: Leon Romanovsky <leonro@xxxxxxxxxxxx> >From Parav: This series implements reference counts for the GID table entries. GID table entries for RoCE are based on IP addresses of netdevices and default GID entries are based on GUID. For IB link layer they are based on SM and provided by the HCA to the stack. These entries can get added/deleted/replaced at anytime in a host. It is desirable to not delete the GID table entry while it is in use in various software flows such as CM message processing, resolving destination MAC and RDMACM message processing to name a few. It is also desirable to not replace a GID table entry while previous GID entry is still being used in a software flow. In a GID table, GID entries for RoCE can belong to multiple network namespaces. Without holding a reference to it, a software flow might process a CM message in two different network namespaces. This series is a prerequisite for enabling namespaces for RoCE. It ensures that while GID entry being used, it cannot get deleted. When the last user of the GID entry releases the reference, the GID entry is released in the software cache and in the HCA. GID entries are maintained by ib_core in a software cache for IB, RoCE and iWarp link layers. All GID search routines now refer to GID software cache. This brings consistency in ULP modules and CM modules to have consistent view of GID entries. Currently RoCE GID table entries are reference counted using kref. IB GID table entries are not yet reference counted, however, the software interface is common to both the link layers. GID entry get/put/hold APIs for IB are no-op. Series consist of patches for following functionalities. 1. Refactor code for cache handling of the RoCE GID entries 2. New APIs for GID get/put/hold operations 3. Changing existing APIs and its users to follow new API signature Series also removes dependency of net+ifindex in the path record entry for RoCE. Both of those fields of the netdev can change and storing net namespace without holding reference will lead to use-after-free crash. Therefore it is removed. Netdevice information for RoCE will be provided via referenced GID attribute in ib_cm_req entry in future. Summary of APIs: 1. Caller who just wish to query the GID based on device, port and index should call rdma_query_gid(). 2. Caller who wish to seach and use the GID and/or use GID attributes for longer period of time in a software flow, should use rdma_find_* APIs. 3. Caller who knows the GID index and want to refer to the GID and/or GID attributes should call rdma_get_gid_attr() and must call rdma_put_gid_attr(). 4. Caller with existing GID reference pointer should call rdma_hold_gid_attr() if it wish to increment reference to the GID attribute. 5. Cached prefix is dropped from most GID related APIs as all the APIs provide information from the cache. IB prefix is replaced with rdma to match to similar new and existing APIs. Thanks Parav Pandit (12): IB/cm: Avoid AV ah_attr overwriting during LAP message handling IB/cm: Store and restore ah_attr during LAP msg processing IB/cm: Store and restore ah_attr during CM message processing IB/core: Refactor gid management code to store state and prop separately IB/core: Introduce GID entry reference counts for RoCE IB/core: Introduce GID attribute get, put and hold APIs RDMA: Use ib_gid_attr in query attributes IB/core: Use GID get and put reference APIs net/smc: Use GID get and put reference APIs IB: Simplify ib_query_gid and its users to drop gid attribute RDMA: Initialize and use sgid attribute from ah_attr RDMA: Use rdma_query_gid() instead of ib_get_cached_gid() drivers/infiniband/core/cache.c | 490 +++++++++++++++++++++--------- drivers/infiniband/core/cm.c | 183 +++++++---- drivers/infiniband/core/cma.c | 86 ++++-- drivers/infiniband/core/device.c | 21 +- drivers/infiniband/core/mad.c | 4 +- drivers/infiniband/core/multicast.c | 19 +- drivers/infiniband/core/sa_query.c | 78 +++-- drivers/infiniband/core/sysfs.c | 48 +-- drivers/infiniband/core/user_mad.c | 1 + drivers/infiniband/core/uverbs_cmd.c | 4 +- drivers/infiniband/core/uverbs_marshall.c | 2 - drivers/infiniband/core/verbs.c | 137 ++++++--- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 83 +++-- drivers/infiniband/hw/hns/hns_roce_ah.c | 14 +- drivers/infiniband/hw/mlx4/ah.c | 16 +- drivers/infiniband/hw/mlx4/main.c | 15 +- drivers/infiniband/hw/mlx4/qp.c | 32 +- drivers/infiniband/hw/mlx5/ah.c | 11 +- drivers/infiniband/hw/mlx5/main.c | 32 +- drivers/infiniband/hw/mlx5/mlx5_ib.h | 6 +- drivers/infiniband/hw/mlx5/qp.c | 10 +- drivers/infiniband/hw/mthca/mthca_av.c | 12 +- drivers/infiniband/hw/ocrdma/ocrdma_ah.c | 17 +- drivers/infiniband/hw/ocrdma/ocrdma_hw.c | 19 +- drivers/infiniband/hw/qedr/qedr_roce_cm.c | 14 +- drivers/infiniband/hw/qedr/verbs.c | 16 +- drivers/infiniband/sw/rxe/rxe_av.c | 3 +- drivers/infiniband/sw/rxe/rxe_loc.h | 1 - drivers/infiniband/sw/rxe/rxe_net.c | 51 ++-- drivers/infiniband/sw/rxe/rxe_qp.c | 26 +- drivers/infiniband/sw/rxe/rxe_recv.c | 12 +- drivers/infiniband/sw/rxe/rxe_verbs.c | 11 +- drivers/infiniband/ulp/ipoib/ipoib_ib.c | 3 +- drivers/infiniband/ulp/ipoib/ipoib_main.c | 8 +- drivers/infiniband/ulp/srp/ib_srp.c | 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 3 +- include/rdma/ib_addr.h | 2 + include/rdma/ib_cache.h | 67 ++-- include/rdma/ib_cm.h | 3 + include/rdma/ib_sa.h | 49 +-- include/rdma/ib_verbs.h | 38 ++- net/smc/smc_core.c | 18 +- net/smc/smc_ib.c | 25 +- 43 files changed, 983 insertions(+), 709 deletions(-) -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html