> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma- > owner@xxxxxxxxxxxxxxx] On Behalf Of Jason Gunthorpe > Sent: Wednesday, July 22, 2015 5:09 PM > To: Wan, Kaike > Cc: linux-rdma@xxxxxxxxxxxxxxx; Fleck, John; Weiny, Ira > Subject: Re: [PATCH v8 4/4] IB/sa: Route SA pathrecord query through netlink > > On Thu, Jul 09, 2015 at 01:34:28PM -0400, kaike.wan@xxxxxxxxx wrote: > > From: Kaike Wan <kaike.wan@xxxxxxxxx> > > > > This patch routes a SA pathrecord query to netlink first and processes > > the response appropriately. If a failure is returned, the request will > > be sent through IB. The decision whether to route the request to > > netlink first is determined by the presence of a listener for the > > local service netlink multicast group. If the user-space local service > > netlink multicast group listener is not present, the request will be > > sent through IB, just like what is currently being done. > > So, I tested it again, and the UAPI looks pretty reasonable now. > > The netlink validation still needs to be fixed. > > > There is where we might have a problem: nla_parse() allows only one > > entry for each attribute type in the returned array tb[]. If we have > > multiple (say 6) pathrecords returned, each having different flags > > (for different path use), a later one will replace an earlier one in > > tb[]. > > The parse is OK, continue to use the for loop, nla_parse does more than just > extract into tb, it validates all the sizes are correct, ignore tb. > > My testing shows it seems to get into some kind of fast repeating query loop: > > [ 4904.969188] Return answer 0 > [ 4904.969483] Return answer 0 > [ 4904.969703] Return answer 0 > [ 4904.969824] Return answer 0 > [ 4904.969943] Return answer 0 > [ 4904.970058] Return answer 0 > [ 4904.970175] Return answer 0 > [ 4904.970289] Return answer 0 > > diff --git a/drivers/infiniband/core/sa_query.c > b/drivers/infiniband/core/sa_query.c > index 63ea36d05072..e6b0181e7076 100644 > --- a/drivers/infiniband/core/sa_query.c > +++ b/drivers/infiniband/core/sa_query.c > @@ -640,6 +640,8 @@ static void ib_nl_process_good_resolve_rsp(struct > ib_sa_query *query, > } > } > } > + > + printk("Return answer %u\n",status); > query->callback(query, status, mad); > } > > You'll have to figure that out. I'm just doing ping X while running a responder > on the netlink socket, v7 didn't do this, IIRC. I did not see such a repeated query loop. With a modified version of your debugging code: --- a/drivers/infiniband/core/sa_query.c +++ b/drivers/infiniband/core/sa_query.c @@ -653,6 +653,7 @@ static void ib_nl_process_good_resolve_rsp(struct ib_sa_query *query, } } } + printk("Request %u returns answer %u\n", nlh->nlmsg_seq); query->callback(query, status, mad); } I ran rping a few times and never saw such a behavior: [root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26 Request 2 returns answer 0 client DISCONNECT EVENT... [root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26 Request 3 returns answer 0 client DISCONNECT EVENT... [root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26 Request 4 returns answer 0 client DISCONNECT EVENT... [root@phifs011 ~]# rping -c -C 1 -a 10.228.216.26 Request 5 returns answer 0 client DISCONNECT EVENT... [root@phifs011 ~]# The nl sequence increased sequentially. What's the difference? In my debugging patch, I have also other print statements and I have never seen such a loop at all. > > I think it has something to do with the timer as the first request works fine, > then a pause, then a storm. > > Actually, looks like nothing removes nl queries from the timeout loop when > they successfully complete. (ie this series doesn't even have a hope of > working properly) That is incorrect. The first thing in ib_nl_handle_resolve_resp is to find the request and remove it from the ib_nl_rquest_list. the timeout routine will remove the entries that have timed out from the nl request list. > > Please actually test this patch completely and thoroughly before sending > another version. Generally, I tested the patches (along with a debugging patch) with rping, ping, SRP, and another performance path (no debugging patch). I also tested another ibacm patch for setting timeout. When making major changes, I also ran tests for timeout and error response cases. >I'm broadly happy with this, so get your whole team to look > it over agin. > > > + if (query->callback) { > > + head = (const struct nlattr *) nlmsg_data(nlh); > > + len = nlmsg_len(nlh); > > + switch (query->path_use) { > > + case LS_RESOLVE_PATH_USE_UNIDIRECTIONAL: > > + mask = IB_PATH_PRIMARY | IB_PATH_OUTBOUND; > > + break; > > + > > + case LS_RESOLVE_PATH_USE_ALL: > > + case LS_RESOLVE_PATH_USE_GMP: > > + default: > > + mask = IB_PATH_PRIMARY | IB_PATH_GMP | > > + IB_PATH_BIDIRECTIONAL; > > + break; > > + } > > + nla_for_each_attr(curr, head, len, rem) { > > + if (curr->nla_type == LS_NLA_TYPE_PATH_RECORD) { > > + rec = nla_data(curr); > > + /* > > + * Get the first one. In the future, we may > > + * need to get up to 6 pathrecords. > > + */ > > + if (rec->flags & mask) { > > (rec_>flags & mask) == mask > > All requested bits must be set, not just any requested bit. > > A IB_PATH_PRIMARY | IB_PATH_OUTBOUND result does not satisfy the > requirement for LS_RESOLVE_PATH_USE_GMP. > > The goal is to local the one path of many that satisfies the requirement. > Future kernels will direct 6 path answers Got it. > > > +static int ib_nl_handle_set_timeout(struct sk_buff *skb, > > + struct netlink_callback *cb) > > +{ > > + const struct nlmsghdr *nlh = (struct nlmsghdr *)cb->nlh; > > + int timeout, delta, abs_delta; > > + const struct nlattr *attr; > > + unsigned long flags; > > + struct ib_sa_query *query; > > + long delay = 0; > > + struct nlattr *tb[LS_NLA_TYPE_MAX + 1]; > > + int ret; > > + > > + ret = nla_parse(tb, LS_NLA_TYPE_MAX, nlmsg_data(nlh), > nlmsg_len(nlh), > > + NULL); > > + attr = (const struct nlattr *)tb[LS_NLA_TYPE_TIMEOUT]; > > + if (ret || !attr || nla_len(attr) != sizeof(u32)) > > + goto settimeout_out; > > This probably doesn't need the nested attribute, but I'm indifferent. It's not a nested attribute, only a U32 attribute data. I will use a policy to validate the entire response (for pathrecord also). > > Jason > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html