Hi Tom, As always, thanks for the detailed review. :) On Wed, Jan 11, 2023 at 10:32 PM Tom Talpey <tom@xxxxxxxxxx> wrote: > > On 1/10/2023 4:16 AM, Shyam Prasad N wrote: > > 3. > > I saw that there was a bug in iface_cmp, where we did not do full > > comparison of addresses to match them. > > Fixed it here: > > https://github.com/sprasad-microsoft/smb3-kernel-client/commit/cef2448dc43d1313571e21ce8283bccacf01978e.patch > > > > @Tom Talpey Was this your concern with iface_cmp? > > Took a look at this and I do have some comments. > > Regarding the new address comparator cifs_ipaddr_cmp(), why > does it return a three-value result? It seems to prefer > AF_INET over AF_UNSPEC, and AF_INET6 over AF_INET. Won't > this result in selecting the same physical interface more > than once? The idea is to return consistent values for a given pair, so that, for interfaces with exactly same advertised speeds, they're ordered based on address family first, then based on what memcmp returns. > > Also, it is comparing the entire contents of the sockaddrs, > including padding and scope id's, which have no meaning on > this end of the wire. That will lead to mismatch. My familiarity with IPv6 is relatively low. Steve actually pointed me to NFS client code which does a similar comparison. Let me redo this part. > > > Regarding the interface comparator, which I'll requote here: > > +/* > + * compare two interfaces a and b > + * return 0 if everything matches. > > This is fine, assuming the address comparator is fixed. Matching > everything is the best result. > > + * return 1 if a has higher link speed, or rdma capable, or rss capable > > I'm still uneasy about selecting link speed first. If the mount > specifies RDMA, I think RDMA should be the preferred parameter. > The code you propose would select an ordinary interface, if it > were one bps faster. I could do that. But that only changes the order in which the interfaces are ordered. I think the place where you want this change more is where the interface is actually getting selected for a new connection. I'll do this. > > + * return -1 otherwise. > > Ok on this. :) > > + */ > +static int > +iface_cmp(struct cifs_server_iface *a, struct cifs_server_iface *b) > +{ > + int cmp_ret = 0; > + > + WARN_ON(!a || !b); > + if (a->speed == b->speed) { > + if (a->rdma_capable == b->rdma_capable) { > + if (a->rss_capable == b->rss_capable) { > > RSS is meaningless on an RDMA adapter. The RDMA kernel API uses > Queue Pairs to direct completion interrupts, totally different > and independent from RSS. It's only meaningful if rdma_capable > is false. Ok. I was not sure about whether they can co-exist, and decided to make this extra comparison anyway. Good that you shared this info. But I don't think this code hurts. > > + cmp_ret = cifs_ipaddr_cmp((struct sockaddr *) &a->sockaddr, > + (struct sockaddr *) &b->sockaddr); > + if (!cmp_ret) > + return 0; > + else if (cmp_ret > 0) > + return 1; > > Again, I don't agree that the address family has anything to do > with preferring an interface. This should return -1. As explained above, this just decides the ordering of the interface list. The items on the head of the list do get slight preference, the list is first sorted based on speed, then on capabilities, then on family, then on the actual value. > > + else > + return -1; > + } else if (a->rss_capable > b->rss_capable) > + return 1; > > It's good to prefer an RSS-capable interfgace over non-RSS, but > again, only if the interface is not RDMA. > > + else > + return -1; > + } else if (a->rdma_capable > b->rdma_capable) > + return 1; > > And it's good to prefer RDMA over non-RDMA, but again, it's going > to have very strange results if the client starts sending traffic > over both interfaces for the same flow! I understand your concern on this. And I can fix this. > > + else > + return -1; > + } else if (a->speed > b->speed) > + return 1; > + else > + return -1; > +} > > And again, speeds are only one kind of tiebreaker. This code > only looks at the RSS and RDMA attributes when the speeds > match. The Windows client, for example, *always* prefers RDMA > if available. But Windows has no explicit RDMA mode, it always > starts with TCP. The Linux client is different, yet this code > doesn't seem to check. > > Personally, I think that without an explicit -o rdma, the code > should attempt to connect to any discovered RDMA server > interfaces, via RDMA, and use them exclusively if they work. > Otherwise, it should stay on TCP. So even for this case, no mixing? > > OTOH, if -o rdma was specified, the client should ignore TCP, > or at least, not silently fall back to TCP. Understood. > > In other words, these tests are still too simplistic, and too > likely to result in unexpected mixes of channels. Should we step > back and write a flowchart? I think I understood your concerns. I'll address all these concerns with the next version of my patch and then check if we're on the same page. > > Tom. -- Regards, Shyam