Re: Regression after changes for mounts from IPv6 addresses introducing delays

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ian,

On Mon, Jul 10, 2023 at 10:05:12AM +0800, Ian Kent wrote:
> On 9/7/23 22:57, Salvatore Bonaccorso wrote:
> > Hi
> > 
> > The following sort of regression was noticed while updating a client
> > running Debian buster (with autofs 5.1.2 based) to Debian bullseye
> > (5.1.7 based autofs), but verified it's still present with 5.1.8. The
> > folloing setup is present:
> > 
> > There is a NFS server, dualstacked, with both public IPv4 and IPv6
> > addresses resolvable in DNS. As I cannot put the public IPs here in
> > the report, let's assume It is called nfs-server with addresses
> > 192.168.122.188 and fc00:192:168:122::188.
> > 
> > The client initially running Debian buster, is not dualstacked, has
> > only IPv4 and has correct routing, e.g. pinging to nfs-server will as
> > well go to the IPv4 address of nfs-server, or any tcp connection will
> > go to IPv4 (e.g. ssh -vvv nfs-server).
> > 
> > Automount configuration is fairly simple:
> > 
> > auto.master:
> > [...]
> > /home            /etc/auto/homes01
> > [...]
> > 
> > and /etc/auto/homes01
> > [...]
> > * nfs-server:/srv/homes/homes01/&
> > [...]
> > 
> > (note in fact the real configuration is slightly more complex,
> > nosuid,rw,hard,proto=tcp as options, but veried as wel stripping down the
> > options, even dropping proto=tcp). I hope I correctly de-anonymized
> > not messing up with paths in the logs below.
> > 
> > After the update to autofs 5.1.7 based version there is a 10s delay on
> > mounts from the servers.
> > 
> > In the following logs I had to slightly deanonymize the names. But I
> > think the delay problem is seen enough by the timestamps.
> > 
> > Before the update, with autofs 5.1.2:
> > 
> > Jul 09 11:54:41 clienthost automount[5143]: handle_packet: type = 3
> > Jul 09 11:54:41 clienthost automount[5143]: handle_packet_missing_indirect: token 26, name username, request pid 5285
> > Jul 09 11:54:41 clienthost automount[5143]: attempting to mount entry /home/username
> > Jul 09 11:54:41 clienthost automount[5143]: lookup_mount: lookup(program): username -> -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username
> > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): expanded entry: -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username
> > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): gathered options: nosuid,rw,hard,proto=tcp
> > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): dequote("nfs-server:/srv/homes/homes01/username") -> nfs-server:/srv/homes/homes01/username
> > Jul 09 11:54:41 clienthost automount[5143]: parse_mount: parse(sun): core of entry: options=nosuid,rw,hard,proto=tcp, loc=nfs-server:/srv/homes/homes01/username
> > Jul 09 11:54:41 clienthost automount[5143]: sun_mount: parse(sun): mounting root /home, mountpoint username, what nfs-server:/srv/homes/homes01/username, fstype nfs, options nosuid,rw,hard,proto=tcp
> > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): root=/home name=username what=nfs-server:/srv/homes/homes01/username, fstype=nfs, options=nosuid,rw,hard,proto=tcp
> > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): nfs options="nosuid,rw,hard,proto=tcp", nobind=0, nosymlink=0, ro=0
> > Jul 09 11:54:41 clienthost automount[5143]: get_nfs_info: called with host nfs-server(192.168.122.188) proto 6 version 0x30
> > Jul 09 11:54:41 clienthost automount[5143]: get_nfs_info: nfs v3 rpc ping time: 0.000000
> > Jul 09 11:54:41 clienthost automount[5143]: get_nfs_info: host nfs-server cost 0 weight 0
> > Jul 09 11:54:41 clienthost automount[5143]: prune_host_list: selected subset of hosts that support NFS3 over TCP
> > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): calling mkdir_path /home/username
> > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): calling mount -t nfs -s -o nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username /home/username
> > Jul 09 11:54:41 clienthost automount[5143]: mount_mount: mount(nfs): mounted nfs-server:/srv/homes/homes01/username on /home/username
> > Jul 09 11:54:41 clienthost automount[5143]: dev_ioctl_send_ready: token = 26
> > Jul 09 11:54:41 clienthost automount[5143]: mounted /home/username
> > 
> > After the update to 5.1.7 (or 5.1.8):
> > 
> > Jul 09 11:56:23 clienthost automount[6952]: handle_packet: type = 3
> > Jul 09 11:56:23 clienthost automount[6952]: handle_packet_missing_indirect: token 33, name username, request pid 7104
> > Jul 09 11:56:23 clienthost automount[6952]: attempting to mount entry /home/username
> > Jul 09 11:56:23 clienthost automount[6952]: lookup_mount: lookup(program): username -> -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username
> > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): expanded entry: -nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username
> > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): gathered options: nosuid,rw,hard,proto=tcp
> > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): dequote("nfs-server:/srv/homes/homes01/username") -> nfs-server:/srv/homes/homes01/username
> > Jul 09 11:56:23 clienthost automount[6952]: parse_mount: parse(sun): core of entry: options=nosuid,rw,hard,proto=tcp, loc=nfs-server:/srv/homes/homes01/username
> > Jul 09 11:56:23 clienthost automount[6952]: sun_mount: parse(sun): mounting root /home, mountpoint username, what nfs-server:/srv/homes/homes01/username, fstype nfs, options nosuid,rw,hard,proto=tcp
> > Jul 09 11:56:23 clienthost automount[6952]: mount(nfs): root=/home name=username what=nfs-server:/srv/homes/homes01/username, fstype=nfs, options=nosuid,rw,hard,proto=tcp
> > Jul 09 11:56:23 clienthost automount[6952]: mount(nfs): nfs options="nosuid,rw,hard,proto=tcp", nobind=0, nosymlink=0, ro=0
> > Jul 09 11:56:23 clienthost automount[6952]: get_nfs_info: called with host nfs-server(192.168.122.188) proto 6 version 0x20
> > Jul 09 11:56:23 clienthost automount[6952]: get_nfs_info: nfs v3 rpc ping time: 0.000184
> > Jul 09 11:56:23 clienthost automount[6952]: get_nfs_info: host nfs-server cost 183 weight 0
> > Jul 09 11:56:23 clienthost automount[6952]: prune_host_list: selected subset of hosts that support NFS3 over TCP
> > Jul 09 11:56:23 clienthost automount[6952]: get_supported_ver_and_cost: called with host nfs-server(fc00:192:168:122::188) version 0x20
> > Jul 09 11:56:33 clienthost automount[6952]: get_supported_ver_and_cost: rpc ping time 0.000352
> > Jul 09 11:56:33 clienthost automount[6952]: get_supported_ver_and_cost: cost 352 weight 0
> > Jul 09 11:56:33 clienthost automount[6952]: mount_mount: mount(nfs): calling mkdir_path /home/username
> > Jul 09 11:56:33 clienthost automount[6952]: mount(nfs): calling mount -t nfs -s -o nosuid,rw,hard,proto=tcp nfs-server:/srv/homes/homes01/username /home/username
> > Jul 09 11:56:33 clienthost automount[6952]: mount_mount: mount(nfs): mounted nfs-server:/srv/homes/homes01/username on /home/username
> > Jul 09 11:56:33 clienthost automount[6952]: dev_ioctl_send_ready: token = 33
> > Jul 09 11:56:33 clienthost automount[6952]: mounted /home/username
> > 
> > while first get_nfs_info is called again with host
> > nfs-server(192.168.12.2188), there is a get_supported_ver_and_cost
> > with the IPv6 address, hanging for 10s.
> 
> The later call timed out, it's timeout is 10 seconds for addresses
> 
> that look like they are on the local network so the IPv6 address is
> 
> not responding.
> 
> 
> I know that 10 seconds is a bit long for interactive use, perhaps
> 
> it should be a bit less, but it's better than the 2 minutes we
> 
> would often see ...

Yes fully understand. We got complaints from clients having to wait
10s for their homes to be mounted, this is how we start investigating
it.

> I guess you are saying it should be responding ... not sure what
> 
> I can do about that, I will need to reproduce the problem and IPv6
> 
> is still not something I have a setup for easily available.

I'm not sure. I was actually surprised, given the Client is IPv4 only
networking, that there was happing with the updated autofs actually
a call to the IPv6 address.

> > 
> > If I just revert c578e5b37c3c ("autofs-5.1.5 - Fix NFS mount from IPv6
> > addresses") then the mount is again quick, but logs an additional
> > 
> > Jul 09 16:31:34 clienthost automount[12045]: check_address_proto: requested protocol does not match address
> > 
> > so likely not the right fix.
> 
> That does raise some interesting alternatives though.
> 
> I'm not sure what the function check_address_proto() is seeing but it
> 
> essentially (in 5.1.8) says "if the rpc request is for IPv6 and the
> 
> host address length is not set to the length of an IPv6 address bail
> 
> out". So whatever is wrong there is probably still wrong just not
> 
> happening due to the check returning a fail.
> 
> 
> If I can setup an environment I can check that out further but sounds
> 
> a bit suspicious. I can of course check the code to see if there's an
> 
> obvious setup mistake ...

If I can provide any help in debugging this or you can give me hints
what I can try let me know. I will need to release the client host
soonish again to the customers, but will try to make a similar test
client available for further testing.

> > I suspect mounts from server, which are dualstacked, but route to them
> > is only IPv4, are not correctly handled after 90532146bc0e
> > ("autofs-5.1.3 - fix ipv6 proto option handling") and c578e5b37c3c
> > ("autofs-5.1.5 - Fix NFS mount from IPv6 addresses").
> 
> I'm not sure that these changes affect the routing at all, at least
> 
> not due to autofs itself and the above is a proto TCP request so it
> 
> shouldn't be the server simply not responding ... not sure why it's
> 
> happening.
> 
> 
> The difference between the first and second log traces looks like
> 
> dns lookup has returned two hosts in the second availability ping,
> 
> an IPv4 and an IPv6 address. Not sure why an IPv6 address isn't
> 
> present in the first availability ping.
> 
> 
> It might also be that NFSv3 is being checked for availability rather
> 
> than IPv4. I've assumed that NFS v3 works over IPv6 ...

About NFsv3: This is actually independent, it is the same if I use
NFSv4 (with -vers=4 option, which we have in the original setup, I
just tried to strip down as many options as possible for the
reproducing case, though still not minimal, but we can actually ommint
any of the options).

> > 
> > (note the package in Debian *is* built with --with-libtirpc).
> > 
> > Does any of the above make sense?
> 
> Sadly it does make sense but, as I say, I don't know what's causing
> 
> it.

Ok. Again I'm eager to help as much as possible to pinpoint the issue,
but might need some hints what I should/can try to isolate the
problem. Are there potential missconficuration on client side which
cause us problem for autofs in this setup?

Regards,
Salvatore



[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux