> > > > > > On 2/23/2019 3:26 AM, Leon Romanovsky wrote: > > > > On Thu, Feb 21, 2019 at 08:19:03AM -0800, Steve Wise wrote: > > > >> This function sends the constructed netlink message and then > > > >> receives the response, displaying any error text. > > > >> > > > >> Change 'rdma dev set' to use it. > > > >> > > > >> Signed-off-by: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > > > >> --- > > > >> rdma/dev.c | 2 +- > > > >> rdma/rdma.h | 1 + > > > >> rdma/utils.c | 21 +++++++++++++++++++++ > > > >> 3 files changed, 23 insertions(+), 1 deletion(-) > > > >> > > > >> diff --git a/rdma/dev.c b/rdma/dev.c > > > >> index 60ff4b31e320..d2949c378f08 100644 > > > >> --- a/rdma/dev.c > > > >> +++ b/rdma/dev.c > > > >> @@ -273,7 +273,7 @@ static int dev_set_name(struct rd *rd) > > > >> mnl_attr_put_u32(rd->nlh, RDMA_NLDEV_ATTR_DEV_INDEX, rd- > > > >dev_idx); > > > >> mnl_attr_put_strz(rd->nlh, RDMA_NLDEV_ATTR_DEV_NAME, > > > rd_argv(rd)); > > > >> > > > >> - return rd_send_msg(rd); > > > >> + return rd_sendrecv_msg(rd, seq); > > > >> } > > > >> > > > >> static int dev_one_set(struct rd *rd) > > > >> diff --git a/rdma/rdma.h b/rdma/rdma.h > > > >> index 547bb5749a39..20be2f12c4f8 100644 > > > >> --- a/rdma/rdma.h > > > >> +++ b/rdma/rdma.h > > > >> @@ -115,6 +115,7 @@ bool rd_check_is_key_exist(struct rd *rd, const > > > char *key); > > > >> */ > > > >> int rd_send_msg(struct rd *rd); > > > >> int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data, uint32_t > > > seq); > > > >> +int rd_sendrecv_msg(struct rd *rd, unsigned int seq); > > > >> void rd_prepare_msg(struct rd *rd, uint32_t cmd, uint32_t *seq, > > uint16_t > > > flags); > > > >> int rd_dev_init_cb(const struct nlmsghdr *nlh, void *data); > > > >> int rd_attr_cb(const struct nlattr *attr, void *data); > > > >> diff --git a/rdma/utils.c b/rdma/utils.c > > > >> index 069d44fece10..a6f2826c9605 100644 > > > >> --- a/rdma/utils.c > > > >> +++ b/rdma/utils.c > > > >> @@ -664,6 +664,27 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t > callback, > > > void *data, unsigned int seq) > > > >> return ret; > > > >> } > > > >> > > > >> +static int null_cb(const struct nlmsghdr *nlh, void *data) > > > >> +{ > > > >> + return MNL_CB_OK; > > > >> +} > > > >> + > > > >> +int rd_sendrecv_msg(struct rd *rd, unsigned int seq) > > > >> +{ > > > >> + int ret; > > > >> + > > > >> + ret = rd_send_msg(rd); > > > >> + if (ret) { > > > >> + perror(NULL); > > > > This is more or less already done in rd_send_msg() and that function > > > > prints something in case of execution error. So the missing piece > > > > is to update rd_recv_msg(), so all places will "magically" print errors > > > > and not only dev_set_name(). > > > > > > Yea ok. > > > > > > > dev_set_name() doesn't call rd_recv_msg(). So you're suggesting I fix up > > rd_recv_msg() to display errors and make dev_set_name() call > rd_recv_msg() > > with the null_cb function? You sure that's the way to go? > > I'm sure that we need to fix dev_set_name(), everything else I'm not sure. > > Thanks Hey Leon, adding this to rd_recv_msg(): @@ -693,10 +693,28 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data, unsigned int seq) ret = mnl_cb_run(buf, ret, seq, portid, callback, data); } while (ret > 0); + if (ret < 0) + perror(NULL); + mnl_socket_close(rd->nl); return ret; } Results in unexpected errors being logged when doing a query such as: [root@stevo1 iproute2]# ./rdma/rdma res show qp lqpn 176 error: Invalid argument link mlx5_0/1 lqpn 176 type UD state RTS sq-psn 0 comm [ib_core] error: Invalid argument error: No such file or directory error: Invalid argument error: No such file or directory It appears the "invalid argument" errors are due to rdmatool sending a RDMA_NLDEV_CMD_RES_QP_GET command using the doit kernel method to allow querying for just a QP with lqpn = 176. However, rdmatool isn't passing a port index in the messages that generate the "invalid argument" error from the kernel. IE you must provide a device index and port index when issuing a doit command vs a dumpit command. I think. This error was not found because rd_recv_msg() never displayed any errors previously. Further, the RES_FUNC() massive macro has code that will retry a failed doit call with a dumpit call. I think _##name() should distinguish between failures reported by the kernel doit function vs failures because no doit function exists. Not sure how to support that. static inline int _##name(struct rd *rd) \ { \ uint32_t idx; \ int ret; \ if (id) { \ ret = rd_doit_index(rd, &idx); \ if (ret) { \ ret = _res_send_idx_msg(rd, command, \ name##_idx_parse_cb, \ idx, id); \ if (!ret) \ return ret; \ /* Fallback for old systems without .doit callbacks */ \ } \ } \ return _res_send_msg(rd, command, name##_parse_cb); \ } \ The "no such file or dir" errors are being returned because, in my setup, there are 2 other links that do not have lqpn 176. So there are 2 issues uncovered by adding generic printing of errors in rd_recv_msg() 1) the doit code in rdmatool is generating requests for a doit method in the kernel w/o providing a port index. 2) some paths in rdmatool should not print "benign" errors like filtering on a GET command causing a "does not exist" error returned by the kernel doit func. #1 is a bug, IMO. Can you propose a fix? #2 could be solved by adding an error callback func passed to rd_recv_msg(). Then the RES_FUNC() functions could parse errors like "no such file or dir" when doing a filtered query and silently drop them. And functions like dev_set_name() would display all errors returned because there are no expected errors other than "success". Steve.