On 1/10/2023 4:16 AM, Shyam Prasad N wrote:
3. I saw that there was a bug in iface_cmp, where we did not do full comparison of addresses to match them. Fixed it here: https://github.com/sprasad-microsoft/smb3-kernel-client/commit/cef2448dc43d1313571e21ce8283bccacf01978e.patch @Tom Talpey Was this your concern with iface_cmp?
Took a look at this and I do have some comments. Regarding the new address comparator cifs_ipaddr_cmp(), why does it return a three-value result? It seems to prefer AF_INET over AF_UNSPEC, and AF_INET6 over AF_INET. Won't this result in selecting the same physical interface more than once? Also, it is comparing the entire contents of the sockaddrs, including padding and scope id's, which have no meaning on this end of the wire. That will lead to mismatch. Regarding the interface comparator, which I'll requote here: +/* + * compare two interfaces a and b + * return 0 if everything matches. This is fine, assuming the address comparator is fixed. Matching everything is the best result. + * return 1 if a has higher link speed, or rdma capable, or rss capable I'm still uneasy about selecting link speed first. If the mount specifies RDMA, I think RDMA should be the preferred parameter. The code you propose would select an ordinary interface, if it were one bps faster. + * return -1 otherwise. Ok on this. :) + */ +static int +iface_cmp(struct cifs_server_iface *a, struct cifs_server_iface *b) +{ + int cmp_ret = 0; + + WARN_ON(!a || !b); + if (a->speed == b->speed) { + if (a->rdma_capable == b->rdma_capable) { + if (a->rss_capable == b->rss_capable) { RSS is meaningless on an RDMA adapter. The RDMA kernel API uses Queue Pairs to direct completion interrupts, totally different and independent from RSS. It's only meaningful if rdma_capable is false. + cmp_ret = cifs_ipaddr_cmp((struct sockaddr *) &a->sockaddr, + (struct sockaddr *) &b->sockaddr); + if (!cmp_ret) + return 0; + else if (cmp_ret > 0) + return 1; Again, I don't agree that the address family has anything to do with preferring an interface. This should return -1. + else + return -1; + } else if (a->rss_capable > b->rss_capable) + return 1; It's good to prefer an RSS-capable interfgace over non-RSS, but again, only if the interface is not RDMA. + else + return -1; + } else if (a->rdma_capable > b->rdma_capable) + return 1; And it's good to prefer RDMA over non-RDMA, but again, it's going to have very strange results if the client starts sending traffic over both interfaces for the same flow! + else + return -1; + } else if (a->speed > b->speed) + return 1; + else + return -1; +} And again, speeds are only one kind of tiebreaker. This code only looks at the RSS and RDMA attributes when the speeds match. The Windows client, for example, *always* prefers RDMA if available. But Windows has no explicit RDMA mode, it always starts with TCP. The Linux client is different, yet this code doesn't seem to check. Personally, I think that without an explicit -o rdma, the code should attempt to connect to any discovered RDMA server interfaces, via RDMA, and use them exclusively if they work. Otherwise, it should stay on TCP. OTOH, if -o rdma was specified, the client should ignore TCP, or at least, not silently fall back to TCP. In other words, these tests are still too simplistic, and too likely to result in unexpected mixes of channels. Should we step back and write a flowchart? Tom.