RE: [PATCH v2 1/2] RDMA/Core: add RDMA_NLDEV_CMD_NEWLINK/DELLINK support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Steve Wise [mailto:swise@xxxxxxxxxxxxxxxxxxxxx]
>Sent: Thursday, November 15, 2018 5:04 PM
>To: Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx>; dledford@xxxxxxxxxx;
>jgg@xxxxxxxxxxxx
>Cc: linux-rdma@xxxxxxxxxxxxxxx; BMT@xxxxxxxxxxxxxx
>Subject: RE: [PATCH v2 1/2] RDMA/Core: add
>RDMA_NLDEV_CMD_NEWLINK/DELLINK support
>
>Hey Mike:
>
>> >Add support for new LINK messages to allow adding and deleting rdma
>> >interfaces.  This will be used initially for soft rdma drivers which
>> >instantiate device instances dynamically by the admin specifying a
>> >netdev device to use.  The rdma_rxe module will be the first user of
>> >these messages.
>> >
>> >The design is modeled after RTNL_NEWLINK/DELLINK:  rdma drivers
>> >register with the rdma core if they provide link add/delete functions.
>> >Each driver registers with a unique "type" string, that is used to
>> >dispatch messages coming from user space.  A new RDMA_NLDEV_ATTR is
>> >defined for the "type" string.  User mode will pass 3 attributes in a
>> >NEWLINK message: RDMA_NLDEV_ATTR_IBDEV_NAME for the desired
>> rdma
>> >device
>> >name to be created, RDMA_NLDEV_ATTR_LINK_TYPE for the "type" of link
>> >being added, and RDMA_NLDEV_ATTR_NDEV_NAME for the net_device
>> >interface
>> >to use for this link. The DELLINK message will contain the IBDEV_NAME
>> >and LINK_TYPE attributes.
>> >
>> >Signed-off-by: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
>> >Reviewed-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
>> >---
>> > drivers/infiniband/core/nldev.c  | 136
>> >+++++++++++++++++++++++++++++++++++++++
>> > include/rdma/rdma_netlink.h      |  12 ++++
>> > include/uapi/rdma/rdma_netlink.h |   9 ++-
>> > 3 files changed, 156 insertions(+), 1 deletion(-)
>> >
>> >diff --git a/drivers/infiniband/core/nldev.c
>> b/drivers/infiniband/core/nldev.c
>> >index 573399e3ccc1..926638cd52bc 100644
>> >--- a/drivers/infiniband/core/nldev.c
>> >+++ b/drivers/infiniband/core/nldev.c
>> >@@ -33,6 +33,7 @@
>> > #include <linux/module.h>
>> > #include <linux/pid.h>
>> > #include <linux/pid_namespace.h>
>> >+#include <linux/mutex.h>
>> > #include <net/netlink.h>
>> > #include <rdma/rdma_cm.h>
>> > #include <rdma/rdma_netlink.h>
>> >@@ -107,6 +108,8 @@
>> > 	[RDMA_NLDEV_ATTR_DRIVER_U32]		= { .type = NLA_U32 },
>> > 	[RDMA_NLDEV_ATTR_DRIVER_S64]		= { .type = NLA_S64 },
>> > 	[RDMA_NLDEV_ATTR_DRIVER_U64]		= { .type = NLA_U64 },
>> >+	[RDMA_NLDEV_ATTR_LINK_TYPE]		= { .type =
>> >NLA_NUL_STRING,
>> >+				    .len = RDMA_NLDEV_ATTR_ENTRY_STRLEN
>> >},
>> > };
>> >
>> > static int put_driver_name_print_type(struct sk_buff *msg, const char
>> >*name,
>> >@@ -1103,6 +1106,131 @@ static int nldev_res_get_pd_dumpit(struct
>> >sk_buff *skb,
>> > 	return res_get_common_dumpit(skb, cb, RDMA_RESTRACK_PD);
>> > }
>> >
>> >+static LIST_HEAD(link_ops);
>> >+static DEFINE_MUTEX(link_ops_mutex);
>> >+
>> >+static const struct rdma_link_ops *link_ops_get(const char *type)
>> >+{
>>
>> _get() usually seems like a reference counted mechanism.
>>
>> Would this be better named as _ops_find()?
>>
>
>Perhaps, but see my answers below.  I think we need the reference.
>
>> >+	const struct rdma_link_ops *ops;
>> >+
>> >+	mutex_lock(&link_ops_mutex);
>> >+	list_for_each_entry(ops, &link_ops, list) {
>> >+		if (!strcmp(ops->type, type))
>> >+			goto out;
>> >+	}
>> >+	ops = NULL;
>> >+out:
>> >+	mutex_unlock(&link_ops_mutex);
>> >+	return ops;
>> >+}
>> >+
>> >+void rdma_link_register(struct rdma_link_ops *ops)
>> >+{
>> >+	if (link_ops_get(ops->type)) {
>> >+		WARN_ONCE("Duplicate rdma_link_ops! %s\n", ops->type);
>> >+		return;
>> >+	}
>> >+	mutex_lock(&link_ops_mutex);
>> >+	list_add(&ops->list, &link_ops);
>> >+	mutex_unlock(&link_ops_mutex);
>> >+}
>> >+EXPORT_SYMBOL(rdma_link_register);
>> >+
>> >+void rdma_link_unregister(struct rdma_link_ops *ops)
>> >+{
>> >+	mutex_lock(&link_ops_mutex);
>> >+	list_del(&ops->list);
>> >+	mutex_unlock(&link_ops_mutex);
>> >+}
>> >+EXPORT_SYMBOL(rdma_link_unregister);
>>
>> Will there be any issues with a link being unregistered while in use?
>>
>
>I guess what could happen is a NL message arrives and the ops ptr is
>fetched, but before it is used, the link is unregistered because the module
>is unloading.  Then the function pointer is dereferenced and the memory has
>been freed and the module unloaded!
>
>Perhaps we just need to reference the module refcnt in link_ops_get() and
>add a link_ops_put() that deref's the module.

That was kinda what I was thinking.

>Jason, is this a good approach?
>
>> >+
>> >+static int nldev_newlink(struct sk_buff *skb, struct nlmsghdr *nlh,
>> >+			  struct netlink_ext_ack *extack)
>> >+{
>> >+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
>> >+	char ibdev_name[IB_DEVICE_NAME_MAX];
>> >+	const struct rdma_link_ops *ops;
>> >+	char ndev_name[IFNAMSIZ];
>> >+	char type[IFNAMSIZ];
>> >+	int err;
>> >+
>> >+	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>> >+			  nldev_policy, extack);
>> >+	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
>> >+	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE] ||
>> >!tb[RDMA_NLDEV_ATTR_NDEV_NAME]) {
>> >+		err = -EINVAL;
>>
>> You are losing the information of nlmsg_parse() failing, is that ok?
>>
>
>I think  any of the the possible failures map to EINVAL anyway:  bad
>attributes or missing attributes.

I followed nlmsg_parse down, and it looks like most of the errors are
EINVAL, but it does return an ERANGE as well.

>
>> >+		goto err_out;
>>
>> Is err_out really necessary?  Maybe just return?
>>
>
>Since there is no unwind code, I guess each of these failures could be just
>a return.  But then if we end up adding unwind code, we'd want the goto's
>back in.

Either way work for me.  Done without the goto seems to be preferred if
there is no unwind.

>> >+	}
>> >+	if (nla_len(tb[RDMA_NLDEV_ATTR_DEV_NAME]) >
>> >sizeof(ibdev_name) ||
>> >+	    nla_len(tb[RDMA_NLDEV_ATTR_LINK_TYPE]) > sizeof(type) ||
>> >+	    nla_len(tb[RDMA_NLDEV_ATTR_NDEV_NAME]) >
>> >sizeof(ndev_name)) {
>> >+		err = -EINVAL;
>> >+		goto err_out;
>> >+	}
>> >+	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
>> >+		    sizeof(ibdev_name));
>> >+	if (strchr(ibdev_name, '%')) {
>> >+		err = -EINVAL;
>> >+		goto err_out;
>> >+	}
>> >+	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));
>> >+	nla_strlcpy(ndev_name, tb[RDMA_NLDEV_ATTR_NDEV_NAME],
>> >+		    sizeof(ndev_name));
>> >+
>> >+	ops = link_ops_get(type);
>> >+#ifdef CONFIG_MODULES
>> >+	if (!ops) {
>> >+		request_module("rdma-link-%s", type);
>> >+		ops = link_ops_get(type);
>> >+	}
>> >+#endif
>> >+	if (!ops) {
>> >+		err = -ENODEV;
>> >+		goto err_out;
>> >+	}
>> >+
>> >+	err = ops->newlink(ibdev_name, ndev_name);
>> >+err_out:
>> >+	return err;
>> >+}
>> >+
>> >+static int nldev_dellink(struct sk_buff *skb, struct nlmsghdr *nlh,
>> >+			  struct netlink_ext_ack *extack)
>> >+{
>> >+	struct nlattr *tb[RDMA_NLDEV_ATTR_MAX];
>> >+	char ibdev_name[IB_DEVICE_NAME_MAX];
>> >+	const struct rdma_link_ops *ops;
>> >+	char type[IFNAMSIZ];
>> >+	int err;
>> >+
>> >+	err = nlmsg_parse(nlh, 0, tb, RDMA_NLDEV_ATTR_MAX - 1,
>> >+			  nldev_policy, extack);
>> >+	if (err || !tb[RDMA_NLDEV_ATTR_DEV_NAME] ||
>> >+	    !tb[RDMA_NLDEV_ATTR_LINK_TYPE]) {
>> >+		err = -EINVAL;
>> >+		goto err_out;
>> >+	}
>> >+
>> >+	if (nla_len(tb[RDMA_NLDEV_ATTR_DEV_NAME]) >
>> >sizeof(ibdev_name) ||
>> >+	    nla_len(tb[RDMA_NLDEV_ATTR_LINK_TYPE]) > sizeof(type)) {
>> >+		err = -EINVAL;
>> >+		goto err_out;
>> >+	}
>> >+	nla_strlcpy(ibdev_name, tb[RDMA_NLDEV_ATTR_DEV_NAME],
>> >+		    sizeof(ibdev_name));
>> >+	nla_strlcpy(type, tb[RDMA_NLDEV_ATTR_LINK_TYPE], sizeof(type));
>> >+
>> >+	ops = link_ops_get(type);
>> >+	if (!ops) {
>> >+		err = -ENODEV;
>> >+		goto err_out;
>> >+	}
>> >+
>> >+	err = ops->dellink(ibdev_name);
>> >+err_out:
>> >+	return err;
>> >+}
>> >+
>> > static const struct rdma_nl_cbs
>nldev_cb_table[RDMA_NLDEV_NUM_OPS]
>> =
>> >{
>> > 	[RDMA_NLDEV_CMD_GET] = {
>> > 		.doit = nldev_get_doit,
>> >@@ -1112,6 +1240,14 @@ static int nldev_res_get_pd_dumpit(struct
>> sk_buff
>> >*skb,
>> > 		.doit = nldev_set_doit,
>> > 		.flags = RDMA_NL_ADMIN_PERM,
>> > 	},
>> >+	[RDMA_NLDEV_CMD_NEWLINK] = {
>> >+		.doit = nldev_newlink,
>> >+		.flags = RDMA_NL_ADMIN_PERM,
>> >+	},
>> >+	[RDMA_NLDEV_CMD_DELLINK] = {
>> >+		.doit = nldev_dellink,
>> >+		.flags = RDMA_NL_ADMIN_PERM,
>> >+	},
>> > 	[RDMA_NLDEV_CMD_PORT_GET] = {
>> > 		.doit = nldev_port_get_doit,
>> > 		.dump = nldev_port_get_dumpit,
>> >diff --git a/include/rdma/rdma_netlink.h b/include/rdma/rdma_netlink.h
>> >index 70218e6b5187..42daca1589bd 100644
>> >--- a/include/rdma/rdma_netlink.h
>> >+++ b/include/rdma/rdma_netlink.h
>> >@@ -99,4 +99,16 @@ int ibnl_put_attr(struct sk_buff *skb, struct
>nlmsghdr
>> >*nlh,
>> >  * Returns true on success or false if no listeners.
>> >  */
>> > bool rdma_nl_chk_listeners(unsigned int group);
>> >+
>> >+struct rdma_link_ops {
>> >+	struct list_head list;
>> >+	const char *type;
>> >+	int (*newlink)(const char *ibdev_name, const char *ndev_name);
>> >+	int (*dellink)(const char *ibdev_name);
>> >+};
>> >+
>> >+void rdma_link_register(struct rdma_link_ops *ops);
>> >+void rdma_link_unregister(struct rdma_link_ops *ops);
>> >+
>> >+#define MODULE_ALIAS_RDMA_LINK(type) MODULE_ALIAS("rdma-link-
>"
>> >type)
>> > #endif /* _RDMA_NETLINK_H */
>> >diff --git a/include/uapi/rdma/rdma_netlink.h
>> >b/include/uapi/rdma/rdma_netlink.h
>> >index f9c41bf59efc..7feb087ab6fd 100644
>> >--- a/include/uapi/rdma/rdma_netlink.h
>> >+++ b/include/uapi/rdma/rdma_netlink.h
>> >@@ -229,7 +229,9 @@ enum rdma_nldev_command {
>> > 	RDMA_NLDEV_CMD_GET, /* can dump */
>> > 	RDMA_NLDEV_CMD_SET,
>> >
>> >-	/* 3 - 4 are free to use */
>> >+	RDMA_NLDEV_CMD_NEWLINK,
>> >+
>> >+	RDMA_NLDEV_CMD_DELLINK,
>> >
>> > 	RDMA_NLDEV_CMD_PORT_GET = 5, /* can dump */
>>
>> Minor nit...
>>
>> Should the "= 5 " be cleaned up?
>>
>
>Yes.  Will do.
>
>> This may be off topic, but is this going the general mechanism for getting
>> stuff to the specific driver?
>>
>> 1) I.e. create a new API interface with a  call back
>> 2) Add the plumbing to the underlying driver
>>
>> Is there a plan for adding other things?
>>
>> Do you happen to have any thoughts on things like driver specific config
>> information?
>
>The plan is to add link get/set messages to allow tweaking individual link
>attributes.  And yes, the proposal is that the plumbing from this series
>will be further extended to handle that.  So I would see new functions added
>to the link_ops for these.  For iwarp devices, this could allow setting
>whether MPA_CRC and MPA_MARKERS are enabled for a link.

Got it.  Thanks for the insights.

>>
>> Thanks,
>>
>> Mike
>>
>
>Thanks for the review!

Sure thing.  Thanks for moving this stuff forward!

Mike

>Steve
>




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux