Hi Ian, On Tue, Jul 11, 2023 at 01:13:06PM +0800, Ian Kent wrote: > On 11/7/23 08:30, Ian Kent wrote: > > On 10/7/23 20:42, Salvatore Bonaccorso wrote: > > > Hi Ian, > > > > > > On Mon, Jul 10, 2023 at 07:11:09PM +0800, Ian Kent wrote: > > > > On 10/7/23 18:29, Salvatore Bonaccorso wrote: > > > > > Hi, > > > > > > > > > > On Mon, Jul 10, 2023 at 04:33:24PM +0800, Ian Kent wrote: > > > > > > On 10/7/23 14:24, Salvatore Bonaccorso wrote: > > > > > > > Hi Ian, > > > > > > > > > > > > > > On Mon, Jul 10, 2023 at 10:05:12AM +0800, Ian Kent wrote: > > > > > > > > On 9/7/23 22:57, Salvatore Bonaccorso wrote: > > > > > > > > > Hi > > > > > > > > > > > > > > > > > > The following sort of regression was noticed > > > > > > > > > while updating a client > > > > > > > > > running Debian buster (with autofs 5.1.2 based) to Debian bullseye > > > > > > > > > (5.1.7 based autofs), but verified it's > > > > > > > > > still present with 5.1.8. The > > > > > > > > > folloing setup is present: > > > > > > > > > > > > > > > > > > There is a NFS server, dualstacked, with both public IPv4 and IPv6 > > > > > > > > > addresses resolvable in DNS. As I cannot put > > > > > > > > > the public IPs here in > > > > > > > > > the report, let's assume It is called nfs-server with addresses > > > > > > > > > 192.168.122.188 and fc00:192:168:122::188. > > > > > > > > > > > > > > > > > > The client initially running Debian buster, > > > > > > > > > is not dualstacked, has > > > > > > > > > only IPv4 and has correct routing, e.g. > > > > > > > > > pinging to nfs-server will as > > > > > > > > > well go to the IPv4 address of nfs-server, > > > > > > > > > or any tcp connection will > > > > > > > > > go to IPv4 (e.g. ssh -vvv nfs-server). > > > > > > > > > > > > > > > > > > Automount configuration is fairly simple: > > > > > > > > > > > > > > > > > > auto.master: > > > > > > > > > [...] > > > > > > > > > /home /etc/auto/homes01 > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > and /etc/auto/homes01 > > > > > > > > > [...] > > > > > > > > > * nfs-server:/srv/homes/homes01/& > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > (note in fact the real configuration is slightly more complex, > > > > > > > > > nosuid,rw,hard,proto=tcp as options, but > > > > > > > > > veried as wel stripping down the > > > > > > > > > options, even dropping proto=tcp). I hope I > > > > > > > > > correctly de-anonymized > > > > > > > > > not messing up with paths in the logs below. > > > > > > > > > > > > > > > > > > After the update to autofs 5.1.7 based > > > > > > > > > version there is a 10s delay on > > > > > > > > > mounts from the servers. > > > > > > > > > > > > > > > > > > In the following logs I had to slightly > > > > > > > > > deanonymize the names. But I > > > > > > > > > think the delay problem is seen enough by the timestamps. > > > > > > > > > > > > > > > > > > Before the update, with autofs 5.1.2: > > > > > > > > > > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > handle_packet: type = 3 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > handle_packet_missing_indirect: token 26, > > > > > > > > > name username, request pid 5285 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > attempting to mount entry /home/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > lookup_mount: lookup(program): username -> > > > > > > > > > -nosuid,rw,hard,proto=tcp > > > > > > > > > nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > parse_mount: parse(sun): expanded entry: > > > > > > > > > -nosuid,rw,hard,proto=tcp > > > > > > > > > nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > parse_mount: parse(sun): gathered options: > > > > > > > > > nosuid,rw,hard,proto=tcp > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > parse_mount: parse(sun): > > > > > > > > > dequote("nfs-server:/srv/homes/homes01/username") > > > > > > > > > -> nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > parse_mount: parse(sun): core of entry: > > > > > > > > > options=nosuid,rw,hard,proto=tcp, > > > > > > > > > loc=nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > sun_mount: parse(sun): mounting root /home, > > > > > > > > > mountpoint username, what > > > > > > > > > nfs-server:/srv/homes/homes01/username, > > > > > > > > > fstype nfs, options nosuid,rw,hard,proto=tcp > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > mount_mount: mount(nfs): root=/home > > > > > > > > > name=username > > > > > > > > > what=nfs-server:/srv/homes/homes01/username, > > > > > > > > > fstype=nfs, options=nosuid,rw,hard,proto=tcp > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > mount_mount: mount(nfs): nfs > > > > > > > > > options="nosuid,rw,hard,proto=tcp", > > > > > > > > > nobind=0, nosymlink=0, ro=0 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > get_nfs_info: called with host > > > > > > > > > nfs-server(192.168.122.188) proto 6 version > > > > > > > > > 0x30 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > get_nfs_info: nfs v3 rpc ping time: 0.000000 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > get_nfs_info: host nfs-server cost 0 weight > > > > > > > > > 0 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > prune_host_list: selected subset of hosts > > > > > > > > > that support NFS3 over TCP > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > mount_mount: mount(nfs): calling mkdir_path > > > > > > > > > /home/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > mount_mount: mount(nfs): calling mount -t > > > > > > > > > nfs -s -o nosuid,rw,hard,proto=tcp > > > > > > > > > nfs-server:/srv/homes/homes01/username > > > > > > > > > /home/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > mount_mount: mount(nfs): mounted > > > > > > > > > nfs-server:/srv/homes/homes01/username on > > > > > > > > > /home/username > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: > > > > > > > > > dev_ioctl_send_ready: token = 26 > > > > > > > > > Jul 09 11:54:41 clienthost automount[5143]: mounted /home/username > > > > > > > > > > > > > > > > > > After the update to 5.1.7 (or 5.1.8): > > > > > > > > > > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > handle_packet: type = 3 > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > handle_packet_missing_indirect: token 33, > > > > > > > > > name username, request pid 7104 > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > attempting to mount entry /home/username > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > lookup_mount: lookup(program): username -> > > > > > > > > > -nosuid,rw,hard,proto=tcp > > > > > > > > > nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > parse_mount: parse(sun): expanded entry: > > > > > > > > > -nosuid,rw,hard,proto=tcp > > > > > > > > > nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > parse_mount: parse(sun): gathered options: > > > > > > > > > nosuid,rw,hard,proto=tcp > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > parse_mount: parse(sun): > > > > > > > > > dequote("nfs-server:/srv/homes/homes01/username") > > > > > > > > > -> nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > parse_mount: parse(sun): core of entry: > > > > > > > > > options=nosuid,rw,hard,proto=tcp, > > > > > > > > > loc=nfs-server:/srv/homes/homes01/username > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > sun_mount: parse(sun): mounting root /home, > > > > > > > > > mountpoint username, what > > > > > > > > > nfs-server:/srv/homes/homes01/username, > > > > > > > > > fstype nfs, options nosuid,rw,hard,proto=tcp > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > mount(nfs): root=/home name=username > > > > > > > > > what=nfs-server:/srv/homes/homes01/username, > > > > > > > > > fstype=nfs, options=nosuid,rw,hard,proto=tcp > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > mount(nfs): nfs > > > > > > > > > options="nosuid,rw,hard,proto=tcp", > > > > > > > > > nobind=0, nosymlink=0, ro=0 > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > get_nfs_info: called with host > > > > > > > > > nfs-server(192.168.122.188) proto 6 version > > > > > > > > > 0x20 > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > get_nfs_info: nfs v3 rpc ping time: 0.000184 > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > get_nfs_info: host nfs-server cost 183 > > > > > > > > > weight 0 > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > prune_host_list: selected subset of hosts > > > > > > > > > that support NFS3 over TCP > > > > > > > > > Jul 09 11:56:23 clienthost automount[6952]: > > > > > > > > > get_supported_ver_and_cost: called with host > > > > > > > > > nfs-server(fc00:192:168:122::188) version > > > > > > > > > 0x20 > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: > > > > > > > > > get_supported_ver_and_cost: rpc ping time > > > > > > > > > 0.000352 > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: > > > > > > > > > get_supported_ver_and_cost: cost 352 weight > > > > > > > > > 0 > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: > > > > > > > > > mount_mount: mount(nfs): calling mkdir_path > > > > > > > > > /home/username > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: > > > > > > > > > mount(nfs): calling mount -t nfs -s -o > > > > > > > > > nosuid,rw,hard,proto=tcp > > > > > > > > > nfs-server:/srv/homes/homes01/username > > > > > > > > > /home/username > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: > > > > > > > > > mount_mount: mount(nfs): mounted > > > > > > > > > nfs-server:/srv/homes/homes01/username on > > > > > > > > > /home/username > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: > > > > > > > > > dev_ioctl_send_ready: token = 33 > > > > > > > > > Jul 09 11:56:33 clienthost automount[6952]: mounted /home/username > > > > > > > > > > > > > > > > > > while first get_nfs_info is called again with host > > > > > > > > > nfs-server(192.168.12.2188), there is a get_supported_ver_and_cost > > > > > > > > > with the IPv6 address, hanging for 10s. > > > > > > > > The later call timed out, it's timeout is 10 seconds for addresses > > > > > > > > > > > > > > > > that look like they are on the local network so the IPv6 address is > > > > > > > > > > > > > > > > not responding. > > > > > > > > > > > > > > > > > > > > > > > > I know that 10 seconds is a bit long for interactive use, perhaps > > > > > > > > > > > > > > > > it should be a bit less, but it's better than the 2 minutes we > > > > > > > > > > > > > > > > would often see ... > > > > > > > Yes fully understand. We got complaints from clients having to wait > > > > > > > 10s for their homes to be mounted, this is how we > > > > > > > start investigating > > > > > > > it. > > > > > > > > > > > > > > > I guess you are saying it should be responding ... not sure what > > > > > > > > > > > > > > > > I can do about that, I will need to reproduce the problem and IPv6 > > > > > > > > > > > > > > > > is still not something I have a setup for easily available. > > > > > > > I'm not sure. I was actually surprised, given the > > > > > > > Client is IPv4 only > > > > > > > networking, that there was happing with the updated autofs actually > > > > > > > a call to the IPv6 address. > > > > > > > > > > > > > > > > If I just revert c578e5b37c3c ("autofs-5.1.5 > > > > > > > > > - Fix NFS mount from IPv6 > > > > > > > > > addresses") then the mount is again quick, but logs an additional > > > > > > > > > > > > > > > > > > Jul 09 16:31:34 clienthost automount[12045]: > > > > > > > > > check_address_proto: requested protocol does > > > > > > > > > not match address > > > > > > > > > > > > > > > > > > so likely not the right fix. > > > > > > > > That does raise some interesting alternatives though. > > > > > > > > > > > > > > > > I'm not sure what the function > > > > > > > > check_address_proto() is seeing but it > > > > > > > > > > > > > > > > essentially (in 5.1.8) says "if the rpc request is for IPv6 and the > > > > > > > > > > > > > > > > host address length is not set to the length of > > > > > > > > an IPv6 address bail > > > > > > > > > > > > > > > > out". So whatever is wrong there is probably still wrong just not > > > > > > > > > > > > > > > > happening due to the check returning a fail. > > > > > > > > > > > > > > > > > > > > > > > > If I can setup an environment I can check that > > > > > > > > out further but sounds > > > > > > > > > > > > > > > > a bit suspicious. I can of course check the code > > > > > > > > to see if there's an > > > > > > > > > > > > > > > > obvious setup mistake ... > > > > > > > If I can provide any help in debugging this or you can give me hints > > > > > > > what I can try let me know. I will need to release the client host > > > > > > > soonish again to the customers, but will try to make a similar test > > > > > > > client available for further testing. > > > > > > > > > > > > > > > > I suspect mounts from server, which are > > > > > > > > > dualstacked, but route to them > > > > > > > > > is only IPv4, are not correctly handled after 90532146bc0e > > > > > > > > > ("autofs-5.1.3 - fix ipv6 proto option handling") and c578e5b37c3c > > > > > > > > > ("autofs-5.1.5 - Fix NFS mount from IPv6 addresses"). > > > > > > > > I'm not sure that these changes affect the routing at all, at least > > > > > > > > > > > > > > > > not due to autofs itself and the above is a proto TCP request so it > > > > > > > > > > > > > > > > shouldn't be the server simply not responding ... not sure why it's > > > > > > > > > > > > > > > > happening. > > > > > > > > > > > > > > > > > > > > > > > > The difference between the first and second log traces looks like > > > > > > > > > > > > > > > > dns lookup has returned two hosts in the second availability ping, > > > > > > > > > > > > > > > > an IPv4 and an IPv6 address. Not sure why an IPv6 address isn't > > > > > > > > > > > > > > > > present in the first availability ping. > > > > > > > > > > > > > > > > > > > > > > > > It might also be that NFSv3 is being checked for > > > > > > > > availability rather > > > > > > > > > > > > > > > > than IPv4. I've assumed that NFS v3 works over IPv6 ... > > > > > > > About NFsv3: This is actually independent, it is the same if I use > > > > > > > NFSv4 (with -vers=4 option, which we have in the original setup, I > > > > > > > just tried to strip down as many options as possible for the > > > > > > > reproducing case, though still not minimal, but we > > > > > > > can actually ommint > > > > > > > any of the options). > > > > > > > > > > > > > > > > (note the package in Debian *is* built with --with-libtirpc). > > > > > > > > > > > > > > > > > > Does any of the above make sense? > > > > > > > > Sadly it does make sense but, as I say, I don't know what's causing > > > > > > > > > > > > > > > > it. > > > > > > > Ok. Again I'm eager to help as much as possible to > > > > > > > pinpoint the issue, > > > > > > > but might need some hints what I should/can try to isolate the > > > > > > > problem. Are there potential missconficuration on client side which > > > > > > > cause us problem for autofs in this setup? > > > > > > This sounds a bit strange because above you mentioned the environment > > > > > > > > > > > > is IPv4 so the curious thing is where is that IPv6 > > > > > > address even coming > > > > > > > > > > > > from. Those addresses are coming from glibc so it's DNS > > > > > > or a hosts file. > > > > > The environment is equipped to be able to do dual stack, and in fact > > > > > the nfs-server has both IPv4 and IPv6. > > > > > > > > > > The particular client is though only equipped with IPv4 address. > > > > > > > > > > For the nfs-server both IPv4 and IPv6 addresses are both in DNS and as > > > > > well locally on every hosts in /etc/hosts file deployed via > > > > > configuration managment. > > > > > > > > > > So both resolvable even from the client, but the client reaches the > > > > > server only via IPv4. > > > > I was able to use a real IPv6 address and the ping works just fine. > > > > > > > > > > > > So it seems that the bug is actually that I try an IPv6 address at > > > > > > > > all on an IPv4 only node. > > > > > > > > > > > > It would be useful to know if there's an IPv6 address assigned to > > > > > > > > the network interface on the customer machine (even if it's the > > > > > > > > link-local address) so I know what to expect. You would think that > > > > > > > > trying to send an IPv6 packet on an interface not configured to do > > > > > > > > so would return an error not just timeout on connecting ... > > > Yes this is the case, on the given interface there is the link-local > > > address assigned: > > > > > > 7: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc > > > noqueue state UP group default qlen 1000 > > > link/ether d8:5e:d3:8f:5d:94 brd ff:ff:ff:ff:ff:ff > > > inet 192.168.122.220/24 brd 192.168.122.255 scope global bond0 > > > valid_lft forever preferred_lft forever > > > inet6 fe80::da5e:d3ff:fe8f:5d94/64 scope link > > > valid_lft forever preferred_lft forever > > > > Oh cool, I was concerned there was a real IPv6 address on the interface > > > > but then IPv6 could be used for communication. > > > > > > I have always thought that the link-local address should be usable for > > > > local communication but I've always had trouble using them, don't know > > > > why. > > > > > > It seems to me then that if an interface doesn't have an IPv6 address > > > > (excluding the link-local address in this case) then an IPv6 address > > > > seen via DNS should excluded from the proximity calculation (since it > > > > would not be reachable via that interface) and consequently excluded > > > > when constructing the list of addresses to be probed. > > > > > > Let me see if I can set things up to test this locally. > > So I sent a patch out but forgot to add a cover letter to talk about a > > couple of things and didn't cc the list either ... > > > Hopefully you get the patch. > > > It won't apply cleanly, at the least you'll need to ignore the hunk > > to CHANGELOG. > > > Let me know if you get the patch and if you have any problems applying > > it that aren't straight forward to resolve. Thank you, the patch just arrived to me and will test it accordingly and report back. Thanks for taking the time for investigating this issue! Should the patch be resent as well to autofs@xxxxxxxxxxxxxxx for completeness? Regards, Salvatore