On 2/19/2020 11:58 AM, Jason Gunthorpe wrote:
On Wed, Feb 19, 2020 at 09:14:06AM -0500, Dennis Dalessandro wrote:
ABI breakage is a strong word, luckily enough it is not defined at all.
We never considered dmesg prints, device names, device ordering as an
ABI. You can't rely on debug features too, they can disappear too.
Agree, it is a strong word and we can call it what you want. The point is
you should be able to rely on the node description not being changed out
from under you unnecessarily though. We aren't talking about a debug feature
here but a core feature to real world deployments.
People really use the node description as some stable name? And then
they put the HCA name in it? Why?
I've seen it in multiple places. Including storage configuration files.
Suffice to say, yes people use it.
Is that some thing unique to the OPA subnet manager?
I don't think so.
I don't recall people complaining about this when we introduced
rdma-ndd by default and changed all the node descriptions away from
the kernel default.
Sure but the reason rdma-ndd exists is because people care about the
node descriptions. I can't really speak to the historical adoption of
rdma-ndd but I believe it was a stand alone package/feature and was a
conscious decision to use or not as opposed to the one package to rule
them all rdma-core like we have now.
Also don't forget the whole thing about the node description is
inherently racey, so relying on it is Rather A Bad Idea.
I think that point is well taken and I don't think anyone is against the
idea of fixing the "hacky" things as you like to say. This one just
caught people by surprise is all.
Should we change the default format string of rdma-ndd to something
else?
I'm not sure. I can envision situations where a user has updated
libraries that are happy with the new persistent names but still want
the node description to not change. If rdma-ndd could do something to
keep the node desc the same, then in situations like this the device
rename would not have to be disabled.
Given that we have seen problems with MVAPICH (even with mlx5),
libfabric, psm2, and I believe open mpi has a similar issue, and that
Intel, Amazon, RedHat, and Suse are experiencing issues from this I
think we should make things as flexible as possible to protect users
from breakages.
We do want to move in a forward direction though so we don't want to go
back to the old way unilaterally. I think distros can handle their
upgrade situations and if we build in protection to rdma-ndd something
like a specific udev rule for keeping the node desc the same. That gives
us the flexibility until all the software and use cases catch up.
-Denny