Hi Roland- On 04/02/2010 01:22 PM, Roland Dreier wrote:
> > The write_ports code will fail both the INET4 and INET6 transport > > creation if > > the transport returns an error when PF_INET6 is specified. Some transports > > that do not support INET6 return an error other than EAFNOSUPPORT. > > That's the real bug. Any reason the RDMA RPC transport can't return > EAFNOSUPPORT in this case? I think Tom's changelog is misleading. The problem is that the RDMA transport actually does support IPv6, but it doesn't support the IPV6ONLY option yet. So if NFS/RDMA binds to a port for IPv4, then the IPv6 bind fails because of the port collision.
IPV6ONLY is a requirement for RPC over IPv6. If the underlying transport does not support IPV6ONLY, then it cannot properly support RPC over IPv6. It's easy enough to catch listener creation calls for IPv6 on such transports, and simply return EAFNOSUPPORT until support for IPV6ONLY can be provided.
The __write_ports() interface is specifically designed to silently fall back to IPv4-only when IPv6 transport creation fails with ENOAFSUPPORT. I don't see a good reason to change the generic logic in __write_ports() if there is a problem with implementing RPC over IPv6 in a specific transport capability. __write_ports() will do the right thing if the transport returns the correct error code.
Implementing the IPV6ONLY option for RDMA binding is probably not feasible for 2.6.34, so the best band-aid for now seems to be Tom's patch.
My recent experience with similar changes suggests the specific solution Tom proposed will trigger extra bug reports and e-mails, as the change appears to affect non-RDMA transports as well. This printk might fire, for example, for INET transports on systems that are built without IPv6 support, or where ipv6.ko is blacklisted in user space.
In other words, I agree that there's a bug that should be addressed in 2.6.34, and I don't have any problem with setting up only an IPv4 listener in this case. But I think the addition of a printk that fires for all transports in this case is problematic.
It would be better to address this in the RPC/RDMA transport capability, and not in generic upper level logic. We already have correct behavior in __write_ports, and the RPC/RDMA transport capability should be changed to use it.
-- chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html