Hi, Le dimanche 17 mai 2015 à 08:50 +0300, Haggai Eran a écrit : > Thanks again everyone for the review comments. I've updated the patch > set > accordingly. The main changes are in the first patch to use a read > -write > semaphore instead of an SRCU, and with the reference counting of > shared > ib_cm_ids. > Please let me know if I missed anything, or if there are other issues > with > the series. > > Regards, > Haggai > > Changes from v3: > - Patch 1 and 3: use read-write semaphore instead of an SRCU. > - Patch 5: > * Use a direct reference count instead of a kref. > * Instead of adding get/put pair for ib_cm_ids, just avoid > destroying an > id when it is still in use. > * Squashes these two patches together, since the first one became > too > short: > IB/cm: Reference count ib_cm_ids > IB/cm: API to retrieve existing listening CM IDs > - Rebase to Doug's to-be-rebased/for-4.2 branch. > > Changes from v2: > - Add patch 1 to change device_mutex to an RCU. > - Remove patch that fixed IPv4 connections to an IPv4/IPv6 listener. > - Limit namespace related changes to RDMA CM and InfiniBand only. > - Rebase on dledford/for-v4.2, with David Ahern's unaligned access > patch. > * Use Michael Wang's capability functions where needed. > - Move the struct net argument to be the first in all functions, to > match the > networking core scheme. > - Patch 2: > * Remove unwanted braces. > - Patch 4: check the return value of ib_find_cached_pkey. > - Patch 8: verify the address family before calling cm_save_ib_info. > - Patch 10: use generic_net instead of a custom radix tree for having > per > network namespace data. > - Minor changes. > > Changes from v1: > - Include patch 1 in this series. > - Rebase for v4.1. > > Changes from v0: > - Fix code review comments by Yann > - Rebase on top of linux-3.19 > > RDMA-CM uses IP based addressing and routing to setup RDMA > connections between > hosts. Currently, all of the IP interfaces and addresses used by the > RDMA-CM > must reside in the init_net namespace. This restricts the usage of > containers > with RDMA to only work with host network namespace (aka the kernel > init_net NS > instance). > > This patchset allows using network namespaces with the RDMA-CM. > > Each RDMA-CM id keeps a reference to a network namespace. > > This reference is based on the process network namespace at the time > of the > creation of the object or inherited from the listener. > > This network namespace is used to perform all IP and network related > operations. Specifically, the local device lookup, as well as the > remote GID > address resolution are done in the context of the RDMA-CM object's > namespace. > This allows outgoing connections to reach the right target, even if > the same > IP address exists in multiple network namespaces. This can happen if > each > network namespace resides on a different P_Key. > > Additionally, the network namespace is used to split the listener > service ID > table. From the user point of view, each network namespace has a > unique, > completely independent table of service IDs. This allows running > multiple > instances of a single service on the same machine, using containers. > To > implement this, multiple RDMA CM IDs, belonging to different > namespaces may > now share their CM ID. When a request on such a CM ID arrives, the > RDMA CM > module finds out the correct namespaces and looks for the RDMA CM ID > matching the request's parameters. > > The functionality introduced by this series would come into play when > the > transport is InfiniBand and IPoIB interfaces are assigned to each > namespace. > Multiple IPoIB interfaces can be created and assigned to different > RDMA-CM > capable containers, for example using pipework [1]. > > Full support for RoCE will be introduced in a later stage. > How does this play with iWarp: as iWarp HCA are aware of IP addresses / UDP/TCP ports, AFAIK, are those tied to namespace with this patchset or will it be possible to use the iWarp HCA to access to address/port resources tied to a different namespace ? > The patches apply against Doug's tree for v4.2. > > The patchset is structured as follows: > > Patch 1 adds a read-write semaphore in addition to the device mutex > in > ib_core to allow traversing the client list without a deadlock in > Patch 3. > > Patch 2 is a relatively trivial API extension, requiring the callers > of certain ib_addr functions to provide a network namespace, as > needed. > > Patches 3 and 4 adds the ability to lookup a network namespace > according to > the IP address, device and P_Key. It finds the matching IPoIB > interfaces, and > safely takes a reference on the network namespace before returning to > the > caller. > > Patches 5-6 make necessary changes to the CM layer, to allow sharing > of a > single CM ID between multiple RDMA CM IDs. This includes adding a > reference > count to ib_cm_id structs, add an API to either create a new CM ID or > use > an existing one, and expose the service ID to ib_cm clients. > > Patches 7-8 do some preliminary refactoring to the rdma_cm module. > Patch 7 > refactors the logic that extracts the IP address from a connect > request to > allow reuse by the namespace lookup code further on. Patch 8 changes > the > way RDMA CM module creates CM IDs, to avoid relying on the > compare_data > feature of ib_cm. This feature associate a single compare_data struct > per > ib_cm_id, so it cannot be used when sharing CM IDs. > > Patches 9-12 add proper namespace support to the RDMA-CM module. This > includes adding multiple port space tables, sharing ib_cm_ids between > rdma_cm_ids, adding a network namespace parameter, and finally > retrieving > the namespace from the creating process. > Regards. -- Yann Droneaud OPTEYA -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html