From: Parav Pandit <parav@xxxxxxxxxxxx> ib_core: caching APIs --------------------- There are few GID query APIs based on parameters and filter usage. Some API return combination of port, index and/or gid attributes. This patch brings uniformity to some of those query (find) APIs to return pointer to gid attributes which contains all the attributes such as port, index, GID type and optional netdevice. These APIs which return pointer to ib_gid_attr holds a referenced pointer to gid_attr. GID reference is released using previously introduced rdma_put_gid_attr(). With this change, there is no need to take netdev reference on every GID query, because SGID attribute is referenced. get/put/hold version of API considered rdma_ prefix, therefore, other find APIs are replaced from ib_ to rdma_ prefix. Since all the GID APIs are are returning entries from the cache, cached prefix is dropped from the APIs. ib_cm: ------ ib_cm module keeps sgid_attr in primary and alternate path address vector. It releases reference to sgid_attr when ib_cm_id is destroyed. ah_attr holds the reference to gid attribute whenever ah_attr is initialized based on work completion or path record entry. In CM request processing path, ah_attr is initialized twice for RoCE and IB (with GRH) based on wc followed by path record. In such case, reference to wc based ah_attr is released once path record based ah_attr initialzation is successful. rdma_cm: -------- rdma_cm module keeps sgid_attr in rdma_dev_addr and it also releases reference to it when rdma_cm_id is destroyed. ib_cm and rdma_cm communicate the sgid attributes via ib_cm_event for IB and RoCE link layer. For IB gid_attr is optional. For IB gid_attr is present only when GRH is present in CM messages. This also avoids a race condition mentioned in [1] where GID lookup based on eth parameters and query GID may map to different netdevice. This patch also simplifies path record structure for RoCE as described below. Path record contained struct net pointer and ifindex of the netdevice. However such structure has two problems. 1. While most RDMA processing doesn't happen in NAPI context, ifindex can change while processing in progress. 2. net pointer is stored without acquring reference. Such design leads to a situation where kernel can crash while net pointer may become invalid. However since it is always initialized to init_net, it was safe. In order to support processing entry in namespace of the arrived packet, it is necessary to avoid such conditions. This patch removes such dependency on net pointer and ifindex; instead it will rely on SGID attribute which contains pointer to netdev. Even though this patch is limited to perform operations in init_net, in future ib_cm_req_event_param and ib_cm_sidr_req_event_param will be extended to contain ib_gid_attr of the SGID. This will enable to perform processing of CM packets in net namespace of netdev of the SGID in which a SGID belongs without taking any additional reference to net namespace in packet processing path. This patch continues to restrict route resolution to init_net net namespace. [1] https://www.spinics.net/lists/linux-rdma/msg58148.html Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> --- drivers/infiniband/core/cache.c | 153 +++++++++++++++--------------- drivers/infiniband/core/cm.c | 126 ++++++++++++++---------- drivers/infiniband/core/cma.c | 79 +++++++++------ drivers/infiniband/core/multicast.c | 19 ++-- drivers/infiniband/core/sa_query.c | 78 +++++++++------ drivers/infiniband/core/user_mad.c | 1 + drivers/infiniband/core/uverbs_marshall.c | 2 - drivers/infiniband/core/verbs.c | 113 ++++++++++++++-------- drivers/infiniband/sw/rxe/rxe_recv.c | 12 ++- drivers/infiniband/ulp/ipoib/ipoib_main.c | 4 +- include/rdma/ib_addr.h | 2 + include/rdma/ib_cache.h | 39 ++++---- include/rdma/ib_cm.h | 3 + include/rdma/ib_sa.h | 49 +--------- include/rdma/ib_verbs.h | 34 ++++++- 15 files changed, 401 insertions(+), 313 deletions(-) diff --git a/drivers/infiniband/core/cache.c b/drivers/infiniband/core/cache.c index 5d4fa1b55448..d7a8fd425261 100644 --- a/drivers/infiniband/core/cache.c +++ b/drivers/infiniband/core/cache.c @@ -618,12 +618,13 @@ static int __ib_cache_gid_get(struct ib_device *ib_dev, u8 port, int index, return 0; } -static int _ib_cache_gid_table_find(struct ib_device *ib_dev, - const union ib_gid *gid, - const struct ib_gid_attr *val, - unsigned long mask, - u8 *port, u16 *index) +static const struct ib_gid_attr * +_ib_cache_gid_table_find(struct ib_device *ib_dev, + const union ib_gid *gid, + const struct ib_gid_attr *val, + unsigned long mask) { + const struct ib_gid_attr *attr; struct ib_gid_table *table; u8 p; int local_index; @@ -634,24 +635,22 @@ static int _ib_cache_gid_table_find(struct ib_device *ib_dev, read_lock_irqsave(&table->rwlock, flags); local_index = find_gid(table, gid, val, false, mask, NULL); if (local_index >= 0) { - if (index) - *index = local_index; - if (port) - *port = p + rdma_start_port(ib_dev); + attr = get_gid_attr_locked(ib_dev, p, + table, local_index); read_unlock_irqrestore(&table->rwlock, flags); - return 0; + return attr; } read_unlock_irqrestore(&table->rwlock, flags); } - return -ENOENT; + return ERR_PTR(-ENOENT); } -static int ib_cache_gid_find(struct ib_device *ib_dev, - const union ib_gid *gid, - enum ib_gid_type gid_type, - struct net_device *ndev, u8 *port, - u16 *index) +static const struct ib_gid_attr * +ib_cache_gid_find(struct ib_device *ib_dev, + const union ib_gid *gid, + enum ib_gid_type gid_type, + struct net_device *ndev) { unsigned long mask = GID_ATTR_FIND_MASK_GID | GID_ATTR_FIND_MASK_GID_TYPE; @@ -660,38 +659,40 @@ static int ib_cache_gid_find(struct ib_device *ib_dev, if (ndev) mask |= GID_ATTR_FIND_MASK_NETDEV; - return _ib_cache_gid_table_find(ib_dev, gid, &gid_attr_val, - mask, port, index); + return _ib_cache_gid_table_find(ib_dev, gid, &gid_attr_val, mask); } /** - * ib_find_cached_gid_by_port - Returns the GID table index where a specified - * GID value occurs. It searches for the specified GID value in the local - * software cache. + * rdma_find_gid_by_port - Returns the GID entry attributes when it finds + * valid GID entry for given search parameters. It searches for the specified + * GID value in the local software cache. * @device: The device to query. * @gid: The GID value to search for. * @gid_type: The GID type to search for. * @port_num: The port number of the device where the GID value should be * searched. * @ndev: In RoCE, the net device of the device. Null means ignore. - * @index: The index into the cached GID table where the GID was found. This - * parameter may be NULL. + * + * Returns sgid attributes if the GID is found with valid reference. + * Or returns ERR_PTR for the error. + * Caller must invoke ib_put_cached_gid_attr() to release the reference. */ -int ib_find_cached_gid_by_port(struct ib_device *ib_dev, - const union ib_gid *gid, - enum ib_gid_type gid_type, - u8 port, struct net_device *ndev, - u16 *index) +const struct ib_gid_attr * +rdma_find_gid_by_port(struct ib_device *ib_dev, + const union ib_gid *gid, + enum ib_gid_type gid_type, + u8 port, struct net_device *ndev) { int local_index; struct ib_gid_table *table; unsigned long mask = GID_ATTR_FIND_MASK_GID | GID_ATTR_FIND_MASK_GID_TYPE; struct ib_gid_attr val = {.ndev = ndev, .gid_type = gid_type}; + const struct ib_gid_attr *attr; unsigned long flags; if (!rdma_is_port_valid(ib_dev, port)) - return -ENOENT; + return ERR_PTR(-ENOENT); table = ib_dev->cache.ports[port - rdma_start_port(ib_dev)].gid; @@ -701,20 +702,20 @@ int ib_find_cached_gid_by_port(struct ib_device *ib_dev, read_lock_irqsave(&table->rwlock, flags); local_index = find_gid(table, gid, &val, false, mask, NULL); if (local_index >= 0) { - if (index) - *index = local_index; + attr = get_gid_attr_locked(ib_dev, port, + table, local_index); read_unlock_irqrestore(&table->rwlock, flags); - return 0; + return attr; } read_unlock_irqrestore(&table->rwlock, flags); - return -ENOENT; + return ERR_PTR(-ENOENT); } -EXPORT_SYMBOL(ib_find_cached_gid_by_port); +EXPORT_SYMBOL(rdma_find_gid_by_port); /** - * ib_cache_gid_find_by_filter - Returns the GID table index where a specified - * GID value occurs + * ib_cache_gid_find_by_filter - Returns the GID table attribute where a + * specified GID value occurs * @device: The device to query. * @gid: The GID value to search for. * @port_num: The port number of the device where the GID value could be @@ -724,32 +725,29 @@ EXPORT_SYMBOL(ib_find_cached_gid_by_port); * otherwise, we continue searching the GID table. It's guaranteed that * while filter is executed, ndev field is valid and the structure won't * change. filter is executed in an atomic context. filter must not be NULL. - * @index: The index into the cached GID table where the GID was found. This - * parameter may be NULL. * * ib_cache_gid_find_by_filter() searches for the specified GID value * of which the filter function returns true in the port's GID table. * This function is only supported on RoCE ports. * */ -static int ib_cache_gid_find_by_filter(struct ib_device *ib_dev, - const union ib_gid *gid, - u8 port, - bool (*filter)(const union ib_gid *, - const struct ib_gid_attr *, - void *), - void *context, - u16 *index) +static const struct ib_gid_attr * +ib_cache_gid_find_by_filter(struct ib_device *ib_dev, + const union ib_gid *gid, + u8 port, + bool (*filter)(const union ib_gid *, + const struct ib_gid_attr *, + void *), + void *context) { + const struct ib_gid_attr *gid_attr = ERR_PTR(-ENOENT); struct ib_gid_table *table; unsigned int i; unsigned long flags; - bool found = false; - if (!rdma_is_port_valid(ib_dev, port) || !rdma_protocol_roce(ib_dev, port)) - return -EPROTONOSUPPORT; + return ERR_PTR(-EPROTONOSUPPORT); table = ib_dev->cache.ports[port - rdma_start_port(ib_dev)].gid; @@ -766,17 +764,12 @@ static int ib_cache_gid_find_by_filter(struct ib_device *ib_dev, memcpy(&attr, &table->data_vec[i].attr, sizeof(attr)); if (filter(gid, &attr, context)) { - found = true; - if (index) - *index = i; + gid_attr = get_gid_attr_locked(ib_dev, port, table, i); break; } } read_unlock_irqrestore(&table->rwlock, flags); - - if (!found) - return -ENOENT; - return 0; + return gid_attr; } static struct ib_gid_table *alloc_gid_table(int sz) @@ -981,45 +974,47 @@ int ib_get_cached_gid(struct ib_device *device, EXPORT_SYMBOL(ib_get_cached_gid); /** - * ib_find_cached_gid - Returns the port number and GID table index where - * a specified GID value occurs. + * rdma_find_gid - Returns SGID attributes if the matching GID is found. * @device: The device to query. * @gid: The GID value to search for. * @gid_type: The GID type to search for. * @ndev: In RoCE, the net device of the device. NULL means ignore. - * @port_num: The port number of the device where the GID value was found. - * @index: The index into the cached GID table where the GID was found. This - * parameter may be NULL. * - * ib_find_cached_gid() searches for the specified GID value in + * rdma_find_gid() searches for the specified GID value in * the local software cache. + * + * Returns sgid attributes if the GID is found with valid reference. + * Or returns ERR_PTR for the error. + * Caller must invoke ib_put_cached_gid_attr() to release the reference. + * */ -int ib_find_cached_gid(struct ib_device *device, - const union ib_gid *gid, - enum ib_gid_type gid_type, - struct net_device *ndev, - u8 *port_num, - u16 *index) + +const struct ib_gid_attr * +rdma_find_gid(struct ib_device *device, + const union ib_gid *gid, + enum ib_gid_type gid_type, + struct net_device *ndev) { - return ib_cache_gid_find(device, gid, gid_type, ndev, port_num, index); + return ib_cache_gid_find(device, gid, gid_type, ndev); } -EXPORT_SYMBOL(ib_find_cached_gid); +EXPORT_SYMBOL(rdma_find_gid); -int ib_find_gid_by_filter(struct ib_device *device, - const union ib_gid *gid, - u8 port_num, - bool (*filter)(const union ib_gid *gid, - const struct ib_gid_attr *, - void *), - void *context, u16 *index) +const struct ib_gid_attr * +rdma_find_gid_by_filter(struct ib_device *device, + const union ib_gid *gid, + u8 port_num, + bool (*filter)(const union ib_gid *gid, + const struct ib_gid_attr *, + void *), + void *context) { /* Only RoCE GID table supports filter function */ if (!rdma_protocol_roce(device, port_num) && filter) - return -EPROTONOSUPPORT; + return ERR_PTR(-EPROTONOSUPPORT); return ib_cache_gid_find_by_filter(device, gid, port_num, filter, - context, index); + context); } int ib_get_cached_pkey(struct ib_device *device, diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 7df4c7173607..0efad89cfe22 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -474,6 +474,11 @@ static int cm_init_av_for_lap(struct cm_port *port, struct ib_wc *wc, if (ret) return ret; + /* + * Now that ah attribute is initialized based new wc, + * old ah attribute can be discarded. + */ + rdma_cleanup_ah_attr_gid_attr(&av->ah_attr); memcpy(&av->ah_attr, &new_ah_attr, sizeof(new_ah_attr)); return 0; } @@ -508,31 +513,50 @@ static int add_cm_id_to_port_list(struct cm_id_private *cm_id_priv, return ret; } -static struct cm_port *get_cm_port_from_path(struct sa_path_rec *path) +static struct cm_port * +get_cm_port_from_path(struct sa_path_rec *path, const struct ib_gid_attr *attr) { struct cm_device *cm_dev; struct cm_port *port = NULL; unsigned long flags; - u8 p; - struct net_device *ndev = ib_get_ndev_from_path(path); - - read_lock_irqsave(&cm.device_lock, flags); - list_for_each_entry(cm_dev, &cm.device_list, list) { - if (!ib_find_cached_gid(cm_dev->ib_device, &path->sgid, - sa_conv_pathrec_to_gid_type(path), - ndev, &p, NULL)) { - port = cm_dev->port[p - 1]; - break; + + if (attr) { + read_lock_irqsave(&cm.device_lock, flags); + list_for_each_entry(cm_dev, &cm.device_list, list) { + if (cm_dev->ib_device == attr->device) { + port = cm_dev->port[attr->port_num - 1]; + break; + } } + read_unlock_irqrestore(&cm.device_lock, flags); + } else { + /* SGID attribute can be NULL in following + * conditions. + * (a) Alternative path + * (b) IB link layer without GRH + * (c) LAP send messages + */ + read_lock_irqsave(&cm.device_lock, flags); + list_for_each_entry(cm_dev, &cm.device_list, list) { + attr = rdma_find_gid(cm_dev->ib_device, + &path->sgid, + sa_conv_pathrec_to_gid_type(path), + NULL); + if (!IS_ERR(attr)) { + port = cm_dev->port[attr->port_num - 1]; + break; + } + } + read_unlock_irqrestore(&cm.device_lock, flags); + if (port) + rdma_put_gid_attr(attr); } - read_unlock_irqrestore(&cm.device_lock, flags); - - if (ndev) - dev_put(ndev); return port; } -static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, +static int cm_init_av_by_path(struct sa_path_rec *path, + const struct ib_gid_attr *sgid_attr, + struct cm_av *av, struct cm_id_private *cm_id_priv) { struct rdma_ah_attr new_ah_attr; @@ -540,7 +564,7 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, struct cm_port *port; int ret; - port = get_cm_port_from_path(path); + port = get_cm_port_from_path(path, sgid_attr); if (!port) return -EINVAL; cm_dev = port->cm_dev; @@ -554,21 +578,31 @@ static int cm_init_av_by_path(struct sa_path_rec *path, struct cm_av *av, /* * av->ah_attr might be initialized based on wc or during - * request processing time. So initialize a new ah_attr on stack. + * request processing time which might have reference to sgid_attr. + * So initialize a new ah_attr on stack. * If initialization fails, old ah_attr is used for sending any * responses. If initialization is successful, than new ah_attr - * is used by overwriting the old one. + * is used by overwriting the old one. So that right ah_attr + * can be used to return an error response. */ ret = ib_init_ah_attr_from_path(cm_dev->ib_device, port->port_num, path, - &new_ah_attr); + &new_ah_attr, sgid_attr); if (ret) return ret; av->timeout = path->packet_life_time + 1; ret = add_cm_id_to_port_list(cm_id_priv, av, port); - if (ret) + if (ret) { + rdma_cleanup_ah_attr_gid_attr(&new_ah_attr); return ret; + } + + /* + * Now that ah attribute is initialized based on path record, + * old ah attribute can be discarded. + */ + rdma_cleanup_ah_attr_gid_attr(&av->ah_attr); memcpy(&av->ah_attr, &new_ah_attr, sizeof(new_ah_attr)); return 0; } @@ -1091,6 +1125,9 @@ static void cm_destroy_id(struct ib_cm_id *cm_id, int err) wait_for_completion(&cm_id_priv->comp); while ((work = cm_dequeue_work(cm_id_priv)) != NULL) cm_free_work(work); + + rdma_cleanup_ah_attr_gid_attr(&cm_id_priv->av.ah_attr); + rdma_cleanup_ah_attr_gid_attr(&cm_id_priv->alt_av.ah_attr); kfree(cm_id_priv->private_data); kfree(cm_id_priv); } @@ -1413,12 +1450,13 @@ int ib_send_cm_req(struct ib_cm_id *cm_id, goto out; } - ret = cm_init_av_by_path(param->primary_path, &cm_id_priv->av, + ret = cm_init_av_by_path(param->primary_path, + param->ppath_sgid_attr, &cm_id_priv->av, cm_id_priv); if (ret) goto error1; if (param->alternate_path) { - ret = cm_init_av_by_path(param->alternate_path, + ret = cm_init_av_by_path(param->alternate_path, NULL, &cm_id_priv->alt_av, cm_id_priv); if (ret) goto error1; @@ -1912,9 +1950,8 @@ static int cm_req_handler(struct cm_work *work) struct ib_cm_id *cm_id; struct cm_id_private *cm_id_priv, *listen_cm_id_priv; struct cm_req_msg *req_msg; - union ib_gid gid; - struct ib_gid_attr gid_attr; const struct ib_global_route *grh; + const struct ib_gid_attr *gid_attr; int ret; req_msg = (struct cm_req_msg *)work->mad_recv_wc->recv_buf.mad; @@ -1959,24 +1996,13 @@ static int cm_req_handler(struct cm_work *work) if (cm_req_has_alt_path(req_msg)) memset(&work->path[1], 0, sizeof(work->path[1])); grh = rdma_ah_read_grh(&cm_id_priv->av.ah_attr); - ret = ib_get_cached_gid(work->port->cm_dev->ib_device, - work->port->port_num, - grh->sgid_index, - &gid, &gid_attr); - if (ret) { - ib_send_cm_rej(cm_id, IB_CM_REJ_UNSUPPORTED, NULL, 0, NULL, 0); - goto rejected; - } + gid_attr = grh->sgid_attr; - if (gid_attr.ndev) { + if (gid_attr && gid_attr->ndev) { work->path[0].rec_type = - sa_conv_gid_to_pathrec_type(gid_attr.gid_type); - sa_path_set_ifindex(&work->path[0], - gid_attr.ndev->ifindex); - sa_path_set_ndev(&work->path[0], - dev_net(gid_attr.ndev)); - dev_put(gid_attr.ndev); + sa_conv_gid_to_pathrec_type(gid_attr->gid_type); } else { + /* If no GID attribute or ndev is null, it is not RoCE. */ cm_path_set_rec_type(work->port->cm_dev->ib_device, work->port->port_num, &work->path[0], @@ -1990,7 +2016,7 @@ static int cm_req_handler(struct cm_work *work) sa_path_set_dmac(&work->path[0], cm_id_priv->av.ah_attr.roce.dmac); work->path[0].hop_limit = grh->hop_limit; - ret = cm_init_av_by_path(&work->path[0], &cm_id_priv->av, + ret = cm_init_av_by_path(&work->path[0], gid_attr, &cm_id_priv->av, cm_id_priv); if (ret) { int err; @@ -2010,8 +2036,8 @@ static int cm_req_handler(struct cm_work *work) goto rejected; } if (cm_req_has_alt_path(req_msg)) { - ret = cm_init_av_by_path(&work->path[1], &cm_id_priv->alt_av, - cm_id_priv); + ret = cm_init_av_by_path(&work->path[1], NULL, + &cm_id_priv->alt_av, cm_id_priv); if (ret) { ib_send_cm_rej(cm_id, IB_CM_REJ_INVALID_ALT_GID, &work->path[0].sgid, @@ -3134,7 +3160,7 @@ int ib_send_cm_lap(struct ib_cm_id *cm_id, goto out; } - ret = cm_init_av_by_path(alternate_path, &cm_id_priv->alt_av, + ret = cm_init_av_by_path(alternate_path, NULL, &cm_id_priv->alt_av, cm_id_priv); if (ret) goto out; @@ -3277,7 +3303,7 @@ static int cm_lap_handler(struct cm_work *work) if (ret) goto unlock; - cm_init_av_by_path(param->alternate_path, &cm_id_priv->alt_av, + cm_init_av_by_path(param->alternate_path, NULL, &cm_id_priv->alt_av, cm_id_priv); cm_id_priv->id.lap_state = IB_CM_LAP_RCVD; cm_id_priv->tid = lap_msg->hdr.tid; @@ -3479,7 +3505,9 @@ int ib_send_cm_sidr_req(struct ib_cm_id *cm_id, return -EINVAL; cm_id_priv = container_of(cm_id, struct cm_id_private, id); - ret = cm_init_av_by_path(param->path, &cm_id_priv->av, cm_id_priv); + ret = cm_init_av_by_path(param->path, param->sgid_attr, + &cm_id_priv->av, + cm_id_priv); if (ret) goto out; @@ -3663,7 +3691,8 @@ error: spin_unlock_irqrestore(&cm_id_priv->lock, flags); } EXPORT_SYMBOL(ib_send_cm_sidr_rep); -static void cm_format_sidr_rep_event(struct cm_work *work) +static void cm_format_sidr_rep_event(struct cm_work *work, + const struct cm_id_private *cm_id_priv) { struct cm_sidr_rep_msg *sidr_rep_msg; struct ib_cm_sidr_rep_event_param *param; @@ -3676,6 +3705,7 @@ static void cm_format_sidr_rep_event(struct cm_work *work) param->qpn = be32_to_cpu(cm_sidr_rep_get_qpn(sidr_rep_msg)); param->info = &sidr_rep_msg->info; param->info_len = sidr_rep_msg->info_length; + param->sgid_attr = cm_id_priv->av.ah_attr.grh.sgid_attr; work->cm_event.private_data = &sidr_rep_msg->private_data; } @@ -3699,7 +3729,7 @@ static int cm_sidr_rep_handler(struct cm_work *work) ib_cancel_mad(cm_id_priv->av.port->mad_agent, cm_id_priv->msg); spin_unlock_irq(&cm_id_priv->lock); - cm_format_sidr_rep_event(work); + cm_format_sidr_rep_event(work, cm_id_priv); cm_process_work(cm_id_priv, work); return 0; out: diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index a403e679c6c1..349294b2927a 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -575,46 +575,54 @@ static int cma_translate_addr(struct sockaddr *addr, struct rdma_dev_addr *dev_a return ret; } -static inline int cma_validate_port(struct ib_device *device, u8 port, - enum ib_gid_type gid_type, - union ib_gid *gid, - struct rdma_id_private *id_priv) +static const struct ib_gid_attr * +cma_validate_port(struct ib_device *device, u8 port, + enum ib_gid_type gid_type, + union ib_gid *gid, + struct rdma_id_private *id_priv) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; int bound_if_index = dev_addr->bound_dev_if; + const struct ib_gid_attr *sgid_attr; int dev_type = dev_addr->dev_type; struct net_device *ndev = NULL; - int ret = -ENODEV; if ((dev_type == ARPHRD_INFINIBAND) && !rdma_protocol_ib(device, port)) - return ret; + return ERR_PTR(-ENODEV); if ((dev_type != ARPHRD_INFINIBAND) && rdma_protocol_ib(device, port)) - return ret; + return ERR_PTR(-ENODEV); if (dev_type == ARPHRD_ETHER && rdma_protocol_roce(device, port)) { ndev = dev_get_by_index(dev_addr->net, bound_if_index); if (!ndev) - return ret; + return ERR_PTR(-ENODEV); } else { gid_type = IB_GID_TYPE_IB; } - ret = ib_find_cached_gid_by_port(device, gid, gid_type, port, - ndev, NULL); - + sgid_attr = rdma_find_gid_by_port(device, gid, gid_type, port, ndev); if (ndev) dev_put(ndev); + return sgid_attr; +} - return ret; +static void cma_bind_sgid_attr(struct rdma_id_private *id_priv, + const struct ib_gid_attr *sgid_attr) +{ + WARN_ON(id_priv->id.route.addr.dev_addr.sgid_attr); + id_priv->id.route.addr.dev_addr.sgid_attr = sgid_attr; } static int cma_acquire_dev(struct rdma_id_private *id_priv, struct rdma_id_private *listen_id_priv) { struct rdma_dev_addr *dev_addr = &id_priv->id.route.addr.dev_addr; + const struct ib_gid_attr *sgid_attr; struct cma_device *cma_dev; union ib_gid gid, iboe_gid, *gidp; + enum ib_gid_type gid_type; + enum ib_gid_type default_type; int ret = -ENODEV; u8 port; @@ -634,14 +642,15 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, port = listen_id_priv->id.port_num; gidp = rdma_protocol_roce(cma_dev->device, port) ? &iboe_gid : &gid; - - ret = cma_validate_port(cma_dev->device, port, - rdma_protocol_ib(cma_dev->device, port) ? - IB_GID_TYPE_IB : - listen_id_priv->gid_type, gidp, - id_priv); - if (!ret) { + gid_type = rdma_protocol_ib(cma_dev->device, port) ? + IB_GID_TYPE_IB : + listen_id_priv->gid_type; + sgid_attr = cma_validate_port(cma_dev->device, port, + gid_type, gidp, id_priv); + if (!IS_ERR(sgid_attr)) { id_priv->id.port_num = port; + cma_bind_sgid_attr(id_priv, sgid_attr); + ret = 0; goto out; } } @@ -655,14 +664,16 @@ static int cma_acquire_dev(struct rdma_id_private *id_priv, gidp = rdma_protocol_roce(cma_dev->device, port) ? &iboe_gid : &gid; - - ret = cma_validate_port(cma_dev->device, port, - rdma_protocol_ib(cma_dev->device, port) ? - IB_GID_TYPE_IB : - cma_dev->default_gid_type[port - 1], - gidp, id_priv); - if (!ret) { + default_type = cma_dev->default_gid_type[port - 1]; + gid_type = + rdma_protocol_ib(cma_dev->device, port) ? + IB_GID_TYPE_IB : default_type; + sgid_attr = cma_validate_port(cma_dev->device, port, + gid_type, gidp, id_priv); + if (!IS_ERR(sgid_attr)) { id_priv->id.port_num = port; + cma_bind_sgid_attr(id_priv, sgid_attr); + ret = 0; goto out; } } @@ -1671,6 +1682,10 @@ void rdma_destroy_id(struct rdma_cm_id *id) cma_deref_id(id_priv->id.context); kfree(id_priv->id.route.path_rec); + + if (id_priv->id.route.addr.dev_addr.sgid_attr) + rdma_put_gid_attr(id_priv->id.route.addr.dev_addr.sgid_attr); + put_net(id_priv->id.route.addr.dev_addr.net); kfree(id_priv); } @@ -2533,8 +2548,6 @@ cma_iboe_set_path_rec_l2_fields(struct rdma_id_private *id_priv) route->path_rec->rec_type = sa_conv_gid_to_pathrec_type(gid_type); route->path_rec->roce.route_resolved = true; - sa_path_set_ndev(route->path_rec, addr->dev_addr.net); - sa_path_set_ifindex(route->path_rec, ndev->ifindex); sa_path_set_dmac(route->path_rec, addr->dev_addr.dst_dev_addr); return ndev; } @@ -3460,7 +3473,8 @@ static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, ib_init_ah_attr_from_path(id_priv->id.device, id_priv->id.port_num, id_priv->id.route.path_rec, - &event.param.ud.ah_attr); + &event.param.ud.ah_attr, + rep->sgid_attr); event.param.ud.qp_num = rep->qpn; event.param.ud.qkey = rep->qkey; event.event = RDMA_CM_EVENT_ESTABLISHED; @@ -3473,6 +3487,8 @@ static int cma_sidr_rep_handler(struct ib_cm_id *cm_id, } ret = id_priv->id.event_handler(&id_priv->id, &event); + + rdma_cleanup_ah_attr_gid_attr(&event.param.ud.ah_attr); if (ret) { /* Destroy the CM ID by returning a non-zero value. */ id_priv->cm_id.ib = NULL; @@ -3529,6 +3545,7 @@ static int cma_resolve_ib_udp(struct rdma_id_private *id_priv, id_priv->cm_id.ib = id; req.path = id_priv->id.route.path_rec; + req.sgid_attr = id_priv->id.route.addr.dev_addr.sgid_attr; req.service_id = rdma_get_service_id(&id_priv->id, cma_dst_addr(id_priv)); req.timeout_ms = 1 << (CMA_CM_RESPONSE_TIMEOUT - 8); req.max_cm_retries = CMA_MAX_CM_RETRIES; @@ -3589,7 +3606,7 @@ static int cma_connect_ib(struct rdma_id_private *id_priv, req.primary_path = &route->path_rec[0]; if (route->num_paths == 2) req.alternate_path = &route->path_rec[1]; - + req.ppath_sgid_attr = id_priv->id.route.addr.dev_addr.sgid_attr; req.service_id = rdma_get_service_id(&id_priv->id, cma_dst_addr(id_priv)); req.qp_num = id_priv->qp_num; req.qp_type = id_priv->id.qp_type; @@ -3953,6 +3970,8 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast) event.event = RDMA_CM_EVENT_MULTICAST_ERROR; ret = id_priv->id.event_handler(&id_priv->id, &event); + + rdma_cleanup_ah_attr_gid_attr(&event.param.ud.ah_attr); if (ret) { cma_exch(id_priv, RDMA_CM_DESTROYING); mutex_unlock(&id_priv->handler_mutex); diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c index 4eb72ff539fc..2051072bd3ad 100644 --- a/drivers/infiniband/core/multicast.c +++ b/drivers/infiniband/core/multicast.c @@ -722,8 +722,7 @@ int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, enum ib_gid_type gid_type, struct rdma_ah_attr *ah_attr) { - int ret; - u16 gid_index; + const struct ib_gid_attr *sgid_attr; /* GID table is not based on the netdevice for IB link layer, * so ignore ndev during search. @@ -733,24 +732,24 @@ int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, else if (!rdma_protocol_roce(device, port_num)) return -EINVAL; - ret = ib_find_cached_gid_by_port(device, &rec->port_gid, - gid_type, port_num, - ndev, - &gid_index); - if (ret) - return ret; + sgid_attr = rdma_find_gid_by_port(device, &rec->port_gid, + gid_type, port_num, + ndev); + if (IS_ERR(sgid_attr)) + return PTR_ERR(sgid_attr); - memset(ah_attr, 0, sizeof *ah_attr); + memset(ah_attr, 0, sizeof(*ah_attr)); ah_attr->type = rdma_ah_find_type(device, port_num); rdma_ah_set_dlid(ah_attr, be16_to_cpu(rec->mlid)); rdma_ah_set_sl(ah_attr, rec->sl); rdma_ah_set_port_num(ah_attr, port_num); rdma_ah_set_static_rate(ah_attr, rec->rate); + rdma_ah_set_grh_sgid_attr(ah_attr, sgid_attr); rdma_ah_set_grh(ah_attr, &rec->mgid, be32_to_cpu(rec->flow_label), - (u8)gid_index, + sgid_attr->index, rec->hop_limit, rec->traffic_class); return 0; diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c index 6fd1018a6c1b..2855b98e4599 100644 --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -1229,18 +1229,12 @@ static u8 get_src_path_mask(struct ib_device *device, u8 port_num) static int roce_resolve_route_from_path(struct ib_device *device, u8 port_num, - struct sa_path_rec *rec) + struct sa_path_rec *rec, + const struct ib_gid_attr *attr) { struct net_device *resolved_dev; - struct net_device *ndev; struct net_device *idev; - struct rdma_dev_addr dev_addr = { - .bound_dev_if = ((sa_path_get_ifindex(rec) >= 0) ? - sa_path_get_ifindex(rec) : 0), - .net = sa_path_get_ndev(rec) ? - sa_path_get_ndev(rec) : - &init_net - }; + struct rdma_dev_addr dev_addr = {}; union { struct sockaddr _sockaddr; struct sockaddr_in _sockaddr_in; @@ -1250,6 +1244,14 @@ roce_resolve_route_from_path(struct ib_device *device, u8 port_num, if (rec->roce.route_resolved) return 0; + if (!attr || !attr->ndev) + return -EINVAL; + + dev_addr.bound_dev_if = attr->ndev->ifindex; + /* TODO: Use net from the ib_gid_attr once it is added to it, + * until than, limit itself to init_net. + */ + dev_addr.net = &init_net; if (!device->get_netdev) return -EOPNOTSUPP; @@ -1278,16 +1280,13 @@ roce_resolve_route_from_path(struct ib_device *device, u8 port_num, ret = -ENODEV; goto done; } - ndev = ib_get_ndev_from_path(rec); rcu_read_lock(); - if ((ndev && ndev != resolved_dev) || + if (attr->ndev != resolved_dev || (resolved_dev != idev && !rdma_is_upper_dev_rcu(idev, resolved_dev))) ret = -EHOSTUNREACH; rcu_read_unlock(); dev_put(resolved_dev); - if (ndev) - dev_put(ndev); done: dev_put(idev); if (!ret) @@ -1297,31 +1296,50 @@ roce_resolve_route_from_path(struct ib_device *device, u8 port_num, static int init_ah_attr_grh_fields(struct ib_device *device, u8 port_num, struct sa_path_rec *rec, - struct rdma_ah_attr *ah_attr) + struct rdma_ah_attr *ah_attr, + const struct ib_gid_attr *gid_attr) { enum ib_gid_type type = sa_conv_pathrec_to_gid_type(rec); - struct net_device *ndev; - u16 gid_index; - int ret; + const struct ib_gid_attr *sgid_attr; - ndev = ib_get_ndev_from_path(rec); - ret = ib_find_cached_gid_by_port(device, &rec->sgid, type, - port_num, ndev, &gid_index); - if (ndev) - dev_put(ndev); - if (ret) - return ret; + if (gid_attr) + sgid_attr = rdma_hold_gid_attr(gid_attr); + else + sgid_attr = + rdma_find_gid_by_port(device, &rec->sgid, + type, + port_num, NULL); + if (IS_ERR(sgid_attr)) + return PTR_ERR(sgid_attr); rdma_ah_set_grh(ah_attr, &rec->dgid, be32_to_cpu(rec->flow_label), - gid_index, rec->hop_limit, + sgid_attr->index, rec->hop_limit, rec->traffic_class); + rdma_ah_set_grh_sgid_attr(ah_attr, sgid_attr); return 0; } +/** + * ib_init_ah_attr_from_path - Initialize address handle attributes based on + * an SA path record. + * @device: Device associated ah attributes initialization. + * @port_num: Port on the specified device. + * @rec: path record entry to use for ah attributes initialization. + * @ah_attr: address handle attributes to initialization from path record. + * @sgid_attr: SGID attribute to consider during initialization. + * + * When ib_init_ah_attr_from_path() returns success, + * (a) for IB link layer it optionally contains a reference to SGID attribute + * when GRH is present for IB link layer. + * (b) for RoCE link layer it contains a reference to SGID attribute. + * User must invoke ib_cleanup_ah_attr_gid_attr() to release reference to SGID + * attributes which are initialized using ib_init_ah_attr_from_path(). + */ int ib_init_ah_attr_from_path(struct ib_device *device, u8 port_num, struct sa_path_rec *rec, - struct rdma_ah_attr *ah_attr) + struct rdma_ah_attr *ah_attr, + const struct ib_gid_attr *gid_attr) { int ret = 0; @@ -1332,7 +1350,8 @@ int ib_init_ah_attr_from_path(struct ib_device *device, u8 port_num, rdma_ah_set_static_rate(ah_attr, rec->rate); if (sa_path_is_roce(rec)) { - ret = roce_resolve_route_from_path(device, port_num, rec); + ret = roce_resolve_route_from_path(device, port_num, rec, + gid_attr); if (ret) return ret; @@ -1349,7 +1368,8 @@ int ib_init_ah_attr_from_path(struct ib_device *device, u8 port_num, } if (rec->hop_limit > 0 || sa_path_is_roce(rec)) - ret = init_ah_attr_grh_fields(device, port_num, rec, ah_attr); + ret = init_ah_attr_grh_fields(device, port_num, + rec, ah_attr, gid_attr); return ret; } EXPORT_SYMBOL(ib_init_ah_attr_from_path); @@ -1557,8 +1577,6 @@ static void ib_sa_path_rec_callback(struct ib_sa_query *sa_query, ARRAY_SIZE(path_rec_table), mad->data, &rec); rec.rec_type = SA_PATH_REC_TYPE_IB; - sa_path_set_ndev(&rec, NULL); - sa_path_set_ifindex(&rec, 0); sa_path_set_dmac_zero(&rec); if (query->conv_pr) { diff --git a/drivers/infiniband/core/user_mad.c b/drivers/infiniband/core/user_mad.c index bb98c9e4a7fd..e0a107e514c3 100644 --- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -268,6 +268,7 @@ static void recv_handler(struct ib_mad_agent *agent, packet->mad.hdr.traffic_class = grh->traffic_class; memcpy(packet->mad.hdr.gid, &grh->dgid, 16); packet->mad.hdr.flow_label = cpu_to_be32(grh->flow_label); + rdma_cleanup_ah_attr_gid_attr(&ah_attr); } if (queue_packet(file, agent, packet)) diff --git a/drivers/infiniband/core/uverbs_marshall.c b/drivers/infiniband/core/uverbs_marshall.c index bb372b4713a4..b8d715c68ca4 100644 --- a/drivers/infiniband/core/uverbs_marshall.c +++ b/drivers/infiniband/core/uverbs_marshall.c @@ -211,7 +211,5 @@ void ib_copy_path_rec_from_user(struct sa_path_rec *dst, /* TODO: No need to set this */ sa_path_set_dmac_zero(dst); - sa_path_set_ndev(dst, NULL); - sa_path_set_ifindex(dst, 0); } EXPORT_SYMBOL(ib_copy_path_rec_from_user); diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 6ddfb1fade79..f11da3fb31dd 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -455,16 +455,16 @@ static bool find_gid_index(const union ib_gid *gid, return true; } -static int get_sgid_index_from_eth(struct ib_device *device, u8 port_num, - u16 vlan_id, const union ib_gid *sgid, - enum ib_gid_type gid_type, - u16 *gid_index) +static const struct ib_gid_attr * +get_sgid_attr_from_eth(struct ib_device *device, u8 port_num, + u16 vlan_id, const union ib_gid *sgid, + enum ib_gid_type gid_type) { struct find_gid_index_context context = {.vlan_id = vlan_id, .gid_type = gid_type}; - return ib_find_gid_by_filter(device, sgid, port_num, find_gid_index, - &context, gid_index); + return rdma_find_gid_by_filter(device, sgid, port_num, find_gid_index, + &context); } int ib_get_gids_from_rdma_hdr(const union rdma_network_hdr *hdr, @@ -506,42 +506,30 @@ EXPORT_SYMBOL(ib_get_gids_from_rdma_hdr); * ah_attribute must have have valid port_num, sgid_index. */ static int ib_resolve_unicast_gid_dmac(struct ib_device *device, - struct rdma_ah_attr *ah_attr) + struct rdma_ah_attr *ah_attr, + const union ib_gid *sgid, + const struct ib_gid_attr *sgid_attr) { - struct ib_gid_attr sgid_attr; struct ib_global_route *grh; int hop_limit = 0xff; - union ib_gid sgid; - int ret; + int ret = 0; grh = rdma_ah_retrieve_grh(ah_attr); - ret = ib_query_gid(device, - rdma_ah_get_port_num(ah_attr), - grh->sgid_index, - &sgid, &sgid_attr); - if (ret || !sgid_attr.ndev) { - if (!ret) - ret = -ENXIO; - return ret; - } - /* If destination is link local and source GID is RoCEv1, * IP stack is not used. */ if (rdma_link_local_addr((struct in6_addr *)grh->dgid.raw) && - sgid_attr.gid_type == IB_GID_TYPE_ROCE) { + sgid_attr->gid_type == IB_GID_TYPE_ROCE) { rdma_get_ll_mac((struct in6_addr *)grh->dgid.raw, ah_attr->roce.dmac); goto done; } - ret = rdma_addr_find_l2_eth_by_grh(&sgid, &grh->dgid, + ret = rdma_addr_find_l2_eth_by_grh(sgid, &grh->dgid, ah_attr->roce.dmac, - sgid_attr.ndev, &hop_limit); + sgid_attr->ndev, &hop_limit); done: - dev_put(sgid_attr.ndev); - grh->hop_limit = hop_limit; return ret; } @@ -565,6 +553,7 @@ int ib_init_ah_attr_from_wc(struct ib_device *device, u8 port_num, int ret; enum rdma_network_type net_type = RDMA_NETWORK_IB; enum ib_gid_type gid_type = IB_GID_TYPE_IB; + const struct ib_gid_attr *sgid_attr = NULL; int hoplimit = 0xff; union ib_gid dgid; union ib_gid sgid; @@ -595,30 +584,38 @@ int ib_init_ah_attr_from_wc(struct ib_device *device, u8 port_num, if (!(wc->wc_flags & IB_WC_GRH)) return -EPROTOTYPE; - ret = get_sgid_index_from_eth(device, port_num, - vlan_id, &dgid, - gid_type, &gid_index); - if (ret) - return ret; + sgid_attr = get_sgid_attr_from_eth(device, port_num, + vlan_id, &dgid, + gid_type); + if (IS_ERR_OR_NULL(sgid_attr)) + return PTR_ERR(sgid_attr); flow_class = be32_to_cpu(grh->version_tclass_flow); rdma_ah_set_grh(ah_attr, &sgid, flow_class & 0xFFFFF, - (u8)gid_index, hoplimit, + (u8)sgid_attr->index, hoplimit, (flow_class >> 20) & 0xFF); - return ib_resolve_unicast_gid_dmac(device, ah_attr); + ret = ib_resolve_unicast_gid_dmac(device, ah_attr, + &dgid, sgid_attr); + if (ret) + rdma_put_gid_attr(sgid_attr); + else + rdma_ah_set_grh_sgid_attr(ah_attr, sgid_attr); + return ret; } else { rdma_ah_set_dlid(ah_attr, wc->slid); rdma_ah_set_path_bits(ah_attr, wc->dlid_path_bits); if (wc->wc_flags & IB_WC_GRH) { if (dgid.global.interface_id != cpu_to_be64(IB_SA_WELL_KNOWN_GUID)) { - ret = ib_find_cached_gid_by_port(device, &dgid, - IB_GID_TYPE_IB, - port_num, NULL, - &gid_index); - if (ret) - return ret; + sgid_attr = + rdma_find_gid_by_port(device, + &dgid, + IB_GID_TYPE_IB, + port_num, NULL); + if (IS_ERR_OR_NULL(sgid_attr)) + return PTR_ERR(sgid_attr); + gid_index = sgid_attr->index; } else { gid_index = 0; } @@ -628,23 +625,47 @@ int ib_init_ah_attr_from_wc(struct ib_device *device, u8 port_num, flow_class & 0xFFFFF, (u8)gid_index, hoplimit, (flow_class >> 20) & 0xFF); + rdma_ah_set_grh_sgid_attr(ah_attr, sgid_attr); } return 0; } } EXPORT_SYMBOL(ib_init_ah_attr_from_wc); +/** + * rdma_cleanup_ah_attr_gid_attr - Release reference to SGID attribute of + * ah attribute. + * @ah_attr: Pointer to ah attribute previously initialized using + * ib_init_ah_attr_from_wc() or using ib_init_ah_attr_from_path(). + * + * Release reference to the SGID attribute of the ah attribute if it is + * non NULL. + * + */ +void rdma_cleanup_ah_attr_gid_attr(struct rdma_ah_attr *ah_attr) +{ + if (ah_attr->grh.sgid_attr) { + rdma_put_gid_attr(ah_attr->grh.sgid_attr); + ah_attr->grh.sgid_attr = NULL; + } +} +EXPORT_SYMBOL(rdma_cleanup_ah_attr_gid_attr); + struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, const struct ib_wc *wc, const struct ib_grh *grh, u8 port_num) { struct rdma_ah_attr ah_attr; + struct ib_ah *ah; int ret; ret = ib_init_ah_attr_from_wc(pd->device, port_num, wc, grh, &ah_attr); if (ret) return ERR_PTR(ret); - return rdma_create_ah(pd, &ah_attr); + ah = rdma_create_ah(pd, &ah_attr); + + rdma_cleanup_ah_attr_gid_attr(&ah_attr); + return ah; } EXPORT_SYMBOL(ib_create_ah_from_wc); @@ -1312,7 +1333,19 @@ static int ib_resolve_eth_dmac(struct ib_device *device, (char *)ah_attr->roce.dmac); } } else { - ret = ib_resolve_unicast_gid_dmac(device, ah_attr); + const struct ib_gid_attr *sgid_attr; + union ib_gid sgid; + + sgid_attr = rdma_get_gid_attr(device, + rdma_ah_get_port_num(ah_attr), + grh->sgid_index, + &sgid); + if (IS_ERR(sgid_attr)) + return PTR_ERR(sgid_attr); + + ret = ib_resolve_unicast_gid_dmac(device, ah_attr, + &sgid, sgid_attr); + rdma_put_gid_attr(sgid_attr); } return ret; } diff --git a/drivers/infiniband/sw/rxe/rxe_recv.c b/drivers/infiniband/sw/rxe/rxe_recv.c index dfba44a40f0b..42797ac6f7b1 100644 --- a/drivers/infiniband/sw/rxe/rxe_recv.c +++ b/drivers/infiniband/sw/rxe/rxe_recv.c @@ -328,6 +328,7 @@ static void rxe_rcv_mcast_pkt(struct rxe_dev *rxe, struct sk_buff *skb) static int rxe_match_dgid(struct rxe_dev *rxe, struct sk_buff *skb) { + const struct ib_gid_attr *gid_attr; union ib_gid dgid; union ib_gid *pdgid; @@ -339,9 +340,14 @@ static int rxe_match_dgid(struct rxe_dev *rxe, struct sk_buff *skb) pdgid = (union ib_gid *)&ipv6_hdr(skb)->daddr; } - return ib_find_cached_gid_by_port(&rxe->ib_dev, pdgid, - IB_GID_TYPE_ROCE_UDP_ENCAP, - 1, skb->dev, NULL); + gid_attr = rdma_find_gid_by_port(&rxe->ib_dev, pdgid, + IB_GID_TYPE_ROCE_UDP_ENCAP, + 1, skb->dev); + if (IS_ERR(gid_attr)) + return PTR_ERR(gid_attr); + + rdma_put_gid_attr(gid_attr); + return 0; } /* rxe_rcv is called from the interface driver */ diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c index cf291f90b58f..4031217d9c7a 100644 --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c @@ -769,8 +769,10 @@ static void path_rec_completion(int status, struct rdma_ah_attr av; if (!ib_init_ah_attr_from_path(priv->ca, priv->port, - pathrec, &av)) + pathrec, &av, NULL)) { ah = ipoib_create_ah(dev, priv->pd, &av); + rdma_cleanup_ah_attr_gid_attr(&av); + } } spin_lock_irqsave(&priv->lock, flags); diff --git a/include/rdma/ib_addr.h b/include/rdma/ib_addr.h index c2c8b1fdeead..715394f6d18a 100644 --- a/include/rdma/ib_addr.h +++ b/include/rdma/ib_addr.h @@ -58,6 +58,7 @@ * @bound_dev_if: An optional device interface index. * @transport: The transport type used. * @net: Network namespace containing the bound_dev_if net_dev. + * @sgid_attr: GID attribute to use for identified SGID */ struct rdma_dev_addr { unsigned char src_dev_addr[MAX_ADDR_LEN]; @@ -67,6 +68,7 @@ struct rdma_dev_addr { int bound_dev_if; enum rdma_transport_type transport; struct net *net; + const struct ib_gid_attr *sgid_attr; enum rdma_network_type network; int hoplimit; }; diff --git a/include/rdma/ib_cache.h b/include/rdma/ib_cache.h index 64b586e83557..83f53c5ed2bd 100644 --- a/include/rdma/ib_cache.h +++ b/include/rdma/ib_cache.h @@ -55,27 +55,28 @@ int ib_get_cached_gid(struct ib_device *device, union ib_gid *gid, struct ib_gid_attr *attr); -int ib_find_cached_gid(struct ib_device *device, - const union ib_gid *gid, - enum ib_gid_type gid_type, - struct net_device *ndev, - u8 *port_num, - u16 *index); +const struct ib_gid_attr * +rdma_find_gid(struct ib_device *device, + const union ib_gid *gid, + enum ib_gid_type gid_type, + struct net_device *ndev); + +const struct ib_gid_attr * +rdma_find_gid_by_port(struct ib_device *device, + const union ib_gid *gid, + enum ib_gid_type gid_type, + u8 port_num, + struct net_device *ndev); -int ib_find_cached_gid_by_port(struct ib_device *device, - const union ib_gid *gid, - enum ib_gid_type gid_type, - u8 port_num, - struct net_device *ndev, - u16 *index); +const struct ib_gid_attr * +rdma_find_gid_by_filter(struct ib_device *device, + const union ib_gid *gid, + u8 port_num, + bool (*filter)(const union ib_gid *gid, + const struct ib_gid_attr *, + void *), + void *context); -int ib_find_gid_by_filter(struct ib_device *device, - const union ib_gid *gid, - u8 port_num, - bool (*filter)(const union ib_gid *gid, - const struct ib_gid_attr *, - void *), - void *context, u16 *index); /** * ib_get_cached_pkey - Returns a cached PKey table entry * @device: The device to query. diff --git a/include/rdma/ib_cm.h b/include/rdma/ib_cm.h index 7979cb04f529..c98d603c0b63 100644 --- a/include/rdma/ib_cm.h +++ b/include/rdma/ib_cm.h @@ -246,6 +246,7 @@ struct ib_cm_sidr_rep_event_param { u32 qkey; u32 qpn; void *info; + const struct ib_gid_attr *sgid_attr; u8 info_len; }; @@ -365,6 +366,7 @@ struct ib_cm_id *ib_cm_insert_listen(struct ib_device *device, struct ib_cm_req_param { struct sa_path_rec *primary_path; struct sa_path_rec *alternate_path; + const struct ib_gid_attr *ppath_sgid_attr; __be64 service_id; u32 qp_num; enum ib_qp_type qp_type; @@ -566,6 +568,7 @@ int ib_send_cm_apr(struct ib_cm_id *cm_id, struct ib_cm_sidr_req_param { struct sa_path_rec *path; + const struct ib_gid_attr *sgid_attr; __be64 service_id; int timeout_ms; const void *private_data; diff --git a/include/rdma/ib_sa.h b/include/rdma/ib_sa.h index bacb144f7780..b6ddf2a1b9d8 100644 --- a/include/rdma/ib_sa.h +++ b/include/rdma/ib_sa.h @@ -172,12 +172,7 @@ struct sa_path_rec_ib { */ struct sa_path_rec_roce { bool route_resolved; - u8 dmac[ETH_ALEN]; - /* ignored in IB */ - int ifindex; - /* ignored in IB */ - struct net *net; - + u8 dmac[ETH_ALEN]; }; struct sa_path_rec_opa { @@ -556,13 +551,10 @@ int ib_init_ah_from_mcmember(struct ib_device *device, u8 port_num, enum ib_gid_type gid_type, struct rdma_ah_attr *ah_attr); -/** - * ib_init_ah_attr_from_path - Initialize address handle attributes based on - * an SA path record. - */ int ib_init_ah_attr_from_path(struct ib_device *device, u8 port_num, struct sa_path_rec *rec, - struct rdma_ah_attr *ah_attr); + struct rdma_ah_attr *ah_attr, + const struct ib_gid_attr *sgid_attr); /** * ib_sa_pack_path - Conert a path record from struct ib_sa_path_rec @@ -667,45 +659,10 @@ static inline void sa_path_set_dmac_zero(struct sa_path_rec *rec) eth_zero_addr(rec->roce.dmac); } -static inline void sa_path_set_ifindex(struct sa_path_rec *rec, int ifindex) -{ - if (sa_path_is_roce(rec)) - rec->roce.ifindex = ifindex; -} - -static inline void sa_path_set_ndev(struct sa_path_rec *rec, struct net *net) -{ - if (sa_path_is_roce(rec)) - rec->roce.net = net; -} - static inline u8 *sa_path_get_dmac(struct sa_path_rec *rec) { if (sa_path_is_roce(rec)) return rec->roce.dmac; return NULL; } - -static inline int sa_path_get_ifindex(struct sa_path_rec *rec) -{ - if (sa_path_is_roce(rec)) - return rec->roce.ifindex; - return 0; -} - -static inline struct net *sa_path_get_ndev(struct sa_path_rec *rec) -{ - if (sa_path_is_roce(rec)) - return rec->roce.net; - return NULL; -} - -static inline struct net_device *ib_get_ndev_from_path(struct sa_path_rec *rec) -{ - return sa_path_get_ndev(rec) ? - dev_get_by_index(sa_path_get_ndev(rec), - sa_path_get_ifindex(rec)) - : NULL; -} - #endif /* IB_SA_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index 2323a5624161..3bd521e98614 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -690,11 +690,12 @@ struct ib_event_handler { } while (0) struct ib_global_route { - union ib_gid dgid; - u32 flow_label; - u8 sgid_index; - u8 hop_limit; - u8 traffic_class; + union ib_gid dgid; + const struct ib_gid_attr *sgid_attr; + u32 flow_label; + u8 sgid_index; + u8 hop_limit; + u8 traffic_class; }; struct ib_grh { @@ -3107,6 +3108,13 @@ int ib_get_rdma_header_version(const union rdma_network_hdr *hdr); * ignored unless the work completion indicates that the GRH is valid. * @ah_attr: Returned attributes that can be used when creating an address * handle for replying to the message. + * When ib_init_ah_attr_from_wc() returns success, + * (a) for IB link layer it optionally contains a reference to SGID attribute + * when GRH is present for IB link layer. + * (b) for RoCE link layer it contains a reference to SGID attribute. + * User must invoke rdma_cleanup_ah_attr_gid_attr() to release reference to SGID + * attributes which are initialized using ib_init_ah_attr_from_wc(). + * */ int ib_init_ah_attr_from_wc(struct ib_device *device, u8 port_num, const struct ib_wc *wc, const struct ib_grh *grh, @@ -3127,6 +3135,8 @@ int ib_init_ah_attr_from_wc(struct ib_device *device, u8 port_num, struct ib_ah *ib_create_ah_from_wc(struct ib_pd *pd, const struct ib_wc *wc, const struct ib_grh *grh, u8 port_num); +void rdma_cleanup_ah_attr_gid_attr(struct rdma_ah_attr *ah_attr); + /** * rdma_modify_ah - Modifies the address vector associated with an address * handle. @@ -3997,6 +4007,20 @@ static inline enum rdma_ah_attr_type rdma_ah_find_type(struct ib_device *dev, return RDMA_AH_ATTR_TYPE_UNDEFINED; } +/** + * rdma_ah_set_grh_sgid_attr - Sets the sgid attribute of GRH + * + * @attr: Pointer to AH attribute structure + * @sgid_attr: Pointer to SGID attribute structure + * + */ +static inline +void rdma_ah_set_grh_sgid_attr(struct rdma_ah_attr *attr, + const struct ib_gid_attr *sgid_attr) +{ + attr->grh.sgid_attr = sgid_attr; +} + /** * ib_lid_cpu16 - Return lid in 16bit CPU encoding. * In the current implementation the only way to get -- 2.14.3 -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html