Hi Ian, On Mon, Jul 10, 2023 at 07:11:09PM +0800, Ian Kent wrote: > On 10/7/23 18:29, Salvatore Bonaccorso wrote: > > Hi, > > > > On Mon, Jul 10, 2023 at 04:33:24PM +0800, Ian Kent wrote: > > > On 10/7/23 14:24, Salvatore Bonaccorso wrote: > > > > Hi Ian, > > > > > > > > On Mon, Jul 10, 2023 at 10:05:12AM +0800, Ian Kent wrote: > > > > > On 9/7/23 22:57, Salvatore Bonaccorso wrote: > > > > > > Hi > > > > > > > > > > > > The following sort of regression was noticed while updating a client > > > > > > running Debian buster (with autofs 5.1.2 based) to Debian bullseye > > > > > > (5.1.7 based autofs), but verified it's still present with 5.1.8. The > > > > > > folloing setup is present: > > > > > > > > > > > > There is a NFS server, dualstacked, with both public IPv4 and IPv6 > > > > > > addresses resolvable in DNS. As I cannot put the public IPs here in > > > > > > the report, let's assume It is called nfs-server with addresses > > > > > > 192.168.122.188 and fc00:192:168:122::188. > > > > > > > > > > > > The client initially running Debian buster, is not dualstacked, has > > > > > > only IPv4 and has correct routing, e.g. pinging to nfs-server will as > > > > > > well go to the IPv4 address of nfs-server, or any tcp connection will > > > > > > go to IPv4 (e.g. ssh -vvv nfs-server). > > > > > > > > > > > > Automount configuration is fairly simple: > > > > > > > > > > > > auto.master: > > > > > > [...] > > > > > > /home /etc/auto/homes01 > > > > > > [...] > > > > > > > > > > > > and /etc/auto/homes01 > > > > > > [...] > > > > > > * nfs-server:/srv/homes/homes01/& > > > > > > [...] > > > > > > > > > > > > (note in fact the real configuration is slightly more complex, > > > > > > nosuid,rw,hard,proto=tcp as options, but veried as wel stripping down the > > > > > > options, even dropping proto=tcp). I hope I correctly de-anonymized > > > > > > not messing up with paths in the logs below. > > > > > > > > > > > > After the update to autofs 5.1.7 based version there is a 10s delay on > > > > > > mounts from the servers. > > > > > > > > > > > > In the following logs I had to slightly deanonymize the names. But I > > > > > > think the delay problem is seen enough by the timestamps. > > > > > > > > > > > > Before the update, with autofs 5.1.2: > > > > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: handle_packet: type = 3 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: handle_packet_missing_indirect: token 26, name username, request pid 5285 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: attempting to mount entry /home/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: lookup_mount: lookup(program): username -> -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): expanded entry: -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): gathered options: nosuid,rw,hard,proto=tcp > > > > > > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): dequote("nfs-server:/srv/homes/homes01/username") -> nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): core of entry: options=nosuid,rw,hard,proto=tcp, loc=nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: sun_mount: parse(sun): mounting root /home, mountpoint username, what nfs-server:/srv/homes/homes01/username, fstype nfs, options nosuid,rw,hard,proto=tcp > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): root=/home name=username what=nfs-server:/srv/homes/homes01/username, fstype=nfs, options=nosuid,rw,hard,proto=tcp > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): nfs options="nosuid,rw,hard,proto=tcp", nobind=0, nosymlink=0, ro=0 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: get_nfs_info: called with host nfs-server(192.168.122.188) proto 6 version 0x30 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: get_nfs_info: nfs v3 rpc ping time: 0.000000 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: get_nfs_info: host nfs-server cost 0 weight 0 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: prune_host_list: selected subset of hosts that support NFS3 over TCP > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): calling mkdir_path /home/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): calling mount -t nfs -s -o nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username /home/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): mounted nfs-server:/srv/homes/homes01/username on /home/username > > > > > > Jul 09 11:54:41 clienthost automount[5143]: dev_ioctl_send_ready: token = 26 > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mounted /home/username > > > > > > > > > > > > After the update to 5.1.7 (or 5.1.8): > > > > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: handle_packet: type = 3 > > > > > > Jul 09 11:56:23 clienthost automount[6952]: handle_packet_missing_indirect: token 33, name username, request pid 7104 > > > > > > Jul 09 11:56:23 clienthost automount[6952]: attempting to mount entry /home/username > > > > > > Jul 09 11:56:23 clienthost automount[6952]: lookup_mount: lookup(program): username -> -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): expanded entry: -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): gathered options: nosuid,rw,hard,proto=tcp > > > > > > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): dequote("nfs-server:/srv/homes/homes01/username") -> nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): core of entry: options=nosuid,rw,hard,proto=tcp, loc=nfs-server:/srv/homes/homes01/username > > > > > > Jul 09 11:56:23 clienthost automount[6952]: sun_mount: parse(sun): mounting root /home, mountpoint username, what nfs-server:/srv/homes/homes01/username, fstype nfs, options nosuid,rw,hard,proto=tcp > > > > > > Jul 09 11:56:23 clienthost automount[6952]: mount(nfs): root=/home name=username what=nfs-server:/srv/homes/homes01/username, fstype=nfs, options=nosuid,rw,hard,proto=tcp > > > > > > Jul 09 11:56:23 clienthost automount[6952]: mount(nfs): nfs options="nosuid,rw,hard,proto=tcp", nobind=0, nosymlink=0, ro=0 > > > > > > Jul 09 11:56:23 clienthost automount[6952]: get_nfs_info: called with host nfs-server(192.168.122.188) proto 6 version 0x20 > > > > > > Jul 09 11:56:23 clienthost automount[6952]: get_nfs_info: nfs v3 rpc ping time: 0.000184 > > > > > > Jul 09 11:56:23 clienthost automount[6952]: get_nfs_info: host nfs-server cost 183 weight 0 > > > > > > Jul 09 11:56:23 clienthost automount[6952]: prune_host_list: selected subset of hosts that support NFS3 over TCP > > > > > > Jul 09 11:56:23 clienthost automount[6952]: get_supported_ver_and_cost: called with host nfs-server(fc00:192:168:122::188) version 0x20 > > > > > > Jul 09 11:56:33 clienthost automount[6952]: get_supported_ver_and_cost: rpc ping time 0.000352 > > > > > > Jul 09 11:56:33 clienthost automount[6952]: get_supported_ver_and_cost: cost 352 weight 0 > > > > > > Jul 09 11:56:33 clienthost automount[6952]: mount_mount: mount(nfs): calling mkdir_path /home/username > > > > > > Jul 09 11:56:33 clienthost automount[6952]: mount(nfs): calling mount -t nfs -s -o nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username /home/username > > > > > > Jul 09 11:56:33 clienthost automount[6952]: mount_mount: mount(nfs): mounted nfs-server:/srv/homes/homes01/username on /home/username > > > > > > Jul 09 11:56:33 clienthost automount[6952]: dev_ioctl_send_ready: token = 33 > > > > > > Jul 09 11:56:33 clienthost automount[6952]: mounted /home/username > > > > > > > > > > > > while first get_nfs_info is called again with host > > > > > > nfs-server(192.168.12.2188), there is a get_supported_ver_and_cost > > > > > > with the IPv6 address, hanging for 10s. > > > > > The later call timed out, it's timeout is 10 seconds for addresses > > > > > > > > > > that look like they are on the local network so the IPv6 address is > > > > > > > > > > not responding. > > > > > > > > > > > > > > > I know that 10 seconds is a bit long for interactive use, perhaps > > > > > > > > > > it should be a bit less, but it's better than the 2 minutes we > > > > > > > > > > would often see ... > > > > Yes fully understand. We got complaints from clients having to wait > > > > 10s for their homes to be mounted, this is how we start investigating > > > > it. > > > > > > > > > I guess you are saying it should be responding ... not sure what > > > > > > > > > > I can do about that, I will need to reproduce the problem and IPv6 > > > > > > > > > > is still not something I have a setup for easily available. > > > > I'm not sure. I was actually surprised, given the Client is IPv4 only > > > > networking, that there was happing with the updated autofs actually > > > > a call to the IPv6 address. > > > > > > > > > > If I just revert c578e5b37c3c ("autofs-5.1.5 - Fix NFS mount from IPv6 > > > > > > addresses") then the mount is again quick, but logs an additional > > > > > > > > > > > > Jul 09 16:31:34 clienthost automount[12045]: check_address_proto: requested protocol does not match address > > > > > > > > > > > > so likely not the right fix. > > > > > That does raise some interesting alternatives though. > > > > > > > > > > I'm not sure what the function check_address_proto() is seeing but it > > > > > > > > > > essentially (in 5.1.8) says "if the rpc request is for IPv6 and the > > > > > > > > > > host address length is not set to the length of an IPv6 address bail > > > > > > > > > > out". So whatever is wrong there is probably still wrong just not > > > > > > > > > > happening due to the check returning a fail. > > > > > > > > > > > > > > > If I can setup an environment I can check that out further but sounds > > > > > > > > > > a bit suspicious. I can of course check the code to see if there's an > > > > > > > > > > obvious setup mistake ... > > > > If I can provide any help in debugging this or you can give me hints > > > > what I can try let me know. I will need to release the client host > > > > soonish again to the customers, but will try to make a similar test > > > > client available for further testing. > > > > > > > > > > I suspect mounts from server, which are dualstacked, but route to them > > > > > > is only IPv4, are not correctly handled after 90532146bc0e > > > > > > ("autofs-5.1.3 - fix ipv6 proto option handling") and c578e5b37c3c > > > > > > ("autofs-5.1.5 - Fix NFS mount from IPv6 addresses"). > > > > > I'm not sure that these changes affect the routing at all, at least > > > > > > > > > > not due to autofs itself and the above is a proto TCP request so it > > > > > > > > > > shouldn't be the server simply not responding ... not sure why it's > > > > > > > > > > happening. > > > > > > > > > > > > > > > The difference between the first and second log traces looks like > > > > > > > > > > dns lookup has returned two hosts in the second availability ping, > > > > > > > > > > an IPv4 and an IPv6 address. Not sure why an IPv6 address isn't > > > > > > > > > > present in the first availability ping. > > > > > > > > > > > > > > > It might also be that NFSv3 is being checked for availability rather > > > > > > > > > > than IPv4. I've assumed that NFS v3 works over IPv6 ... > > > > About NFsv3: This is actually independent, it is the same if I use > > > > NFSv4 (with -vers=4 option, which we have in the original setup, I > > > > just tried to strip down as many options as possible for the > > > > reproducing case, though still not minimal, but we can actually ommint > > > > any of the options). > > > > > > > > > > (note the package in Debian *is* built with --with-libtirpc). > > > > > > > > > > > > Does any of the above make sense? > > > > > Sadly it does make sense but, as I say, I don't know what's causing > > > > > > > > > > it. > > > > Ok. Again I'm eager to help as much as possible to pinpoint the issue, > > > > but might need some hints what I should/can try to isolate the > > > > problem. Are there potential missconficuration on client side which > > > > cause us problem for autofs in this setup? > > > This sounds a bit strange because above you mentioned the environment > > > > > > is IPv4 so the curious thing is where is that IPv6 address even coming > > > > > > from. Those addresses are coming from glibc so it's DNS or a hosts file. > > The environment is equipped to be able to do dual stack, and in fact > > the nfs-server has both IPv4 and IPv6. > > > > The particular client is though only equipped with IPv4 address. > > > > For the nfs-server both IPv4 and IPv6 addresses are both in DNS and as > > well locally on every hosts in /etc/hosts file deployed via > > configuration managment. > > > > So both resolvable even from the client, but the client reaches the > > server only via IPv4. > > I was able to use a real IPv6 address and the ping works just fine. > > > So it seems that the bug is actually that I try an IPv6 address at > > all on an IPv4 only node. > > > It would be useful to know if there's an IPv6 address assigned to > > the network interface on the customer machine (even if it's the > > link-local address) so I know what to expect. You would think that > > trying to send an IPv6 packet on an interface not configured to do > > so would return an error not just timeout on connecting ... Yes this is the case, on the given interface there is the link-local address assigned: 7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether d8:5e:d3:8f:5d:94 brd ff:ff:ff:ff:ff:ff inet 192.168.122.220/24 brd 192.168.122.255 scope global bond0 valid_lft forever preferred_lft forever inet6 fe80::da5e:d3ff:fe8f:5d94/64 scope link valid_lft forever preferred_lft forever > I'll need to think about a sensible way to detect that, maybe there > > was something in the earlier version (but I don't think so), so I'll > > have a look around. Ok, I will be able to perform further tests as you need me to. Regards, Salvatore