On Wed, Feb 19, 2020 at 02:35:09PM -0500, Dennis Dalessandro wrote: > On 2/19/2020 11:58 AM, Jason Gunthorpe wrote: > > On Wed, Feb 19, 2020 at 09:14:06AM -0500, Dennis Dalessandro wrote: > > > > > > ABI breakage is a strong word, luckily enough it is not defined at all. > > > > We never considered dmesg prints, device names, device ordering as an > > > > ABI. You can't rely on debug features too, they can disappear too. > > > > > > Agree, it is a strong word and we can call it what you want. The point is > > > you should be able to rely on the node description not being changed out > > > from under you unnecessarily though. We aren't talking about a debug feature > > > here but a core feature to real world deployments. > > > > People really use the node description as some stable name? And then > > they put the HCA name in it? Why? > > I've seen it in multiple places. Including storage configuration files. > Suffice to say, yes people use it. > > > Is that some thing unique to the OPA subnet manager? > > I don't think so. > > > I don't recall people complaining about this when we introduced > > rdma-ndd by default and changed all the node descriptions away from > > the kernel default. > > Sure but the reason rdma-ndd exists is because people care about the node > descriptions. Yes people do. Give a sys-admin 0x00117501017af5cc vs node0170 hca-0 And see which one they get frustrated with. > > I can't really speak to the historical adoption of rdma-ndd I originally wrote it... So I have some history. > but I believe it was a stand alone package/feature and was a conscious > decision to use or not as opposed to the one package to rule them all > rdma-core like we have now. rdma-ndd was built to solve the race between potential host name changes and ports coming on line. The background is that many people use hostnames to describe their nodes and if they wanted to configure rdma-ndd it would react to new ports and/or the hostname updates and turn around and update the node descriptor according to a configuration specified... If the user wanted to use hostnames they could... Or it could be configured with some static name if that is what admins wanted. Hostname was just the "most likely choice". > > > Also don't forget the whole thing about the node description is > > inherently racey, so relying on it is Rather A Bad Idea. > > I think that point is well taken and I don't think anyone is against the > idea of fixing the "hacky" things as you like to say. This one just caught > people by surprise is all. > > > Should we change the default format string of rdma-ndd to something > > else? > > I'm not sure. I can envision situations where a user has updated libraries > that are happy with the new persistent names but still want the node > description to not change. If rdma-ndd could do something to keep the node > desc the same, then in situations like this the device rename would not have > to be disabled. > > Given that we have seen problems with MVAPICH (even with mlx5), libfabric, > psm2, and I believe open mpi has a similar issue, and that Intel, Amazon, > RedHat, and Suse are experiencing issues from this I think we should make > things as flexible as possible to protect users from breakages. > > We do want to move in a forward direction though so we don't want to go back > to the old way unilaterally. I think distros can handle their upgrade > situations and if we build in protection to rdma-ndd something like a > specific udev rule for keeping the node desc the same. That gives us the > flexibility until all the software and use cases catch up. The use of node descriptor was intended to be entirely up to the installation in a manner to debug/locate nodes. Not be used in libraries. I'm surprised that libraries are broken. Regardless does the old rdma-ndd config exist? Could it be configured and/or modified to give the old names? When it was written we designed the default config to give the old names for backwards compatibility. Apparently this is no longer true? Ira > > -Denny