Re: mount.glusterfs health check troubles - help appreciated

Niels de Vos <ndevos@xxxxxxxxxx> · Sat, 7 May 2016 11:33:15 +0200

On Fri, May 06, 2016 at 04:04:02PM +0530, Kaushal M wrote:
> I'm currently trying to straighten out the encrypted transport
> (SSL/TLS socket) code, and make it more robust, and work well with
> IPv6 in particular [1]. When testing the changes, the mount.glusterfs
> script cause some troubles.
> 
> The mount script tries to check if the mount is online by performing a
> stat on the mount point after the glusterfs command returns, and
> umounts if the stat fails. This is a check is racey and doesn't always
> do the right thing.
> 
> The check is racey because it could be run before the client
> translators have been able to connect to the bricks. The following
> sequence of events happen when the mount happens, which help explain
> the race.
> 
> - mount script runs the glusterfs command
> - mount process fetches the volfile
> - mount process initalizes the graph. The client xlator is also
> initialized now, but the connections aren't started.
> - mount process sends a PARENT_UP event to the graph. The client now
> begins the connection process (portmap first, followed by connecting
> to the brick). It's not guaranteed yet if the connection happened.
> - mount process returns
> - mount script does a stat on mount point to check health
> 
> In an environment (like the on I'm testing in) the connection couldn't
> be completed by the time the health check is done. In my environment,
> the client connection sequence is as follows,
> - the portmap connection is started
>  - the first address returned for the hostname is a IPv6 address. With
> the IPv6 change that was merged recently name lookups are done with
> AF_UNSPEC, which return IPv6. My envrionment returns v6 addresses
> first for getaddrinfo calls (which I think is the default for a lot of
> environments)
>  - the connection fails as glusterd doesn't listen on IPv6 addresses
> (it listens on 0.0.0.0 which v4 only)
>  - a reconnection is made with the next address. This takes a while
> because of the encrypted transports.
>  - portmap query is done after connection is established and the port
> is obtained
> - the client xlator now reconnects to the obtained port.
>  - (same above cycle of connection/reconnection happens)
> - once connection is established, handshakes are done
> - CHILD_UP event is sent
> 
> After this point the client xlator becomes usable.
> 
> But this is not reached before the mount script does the health check
> in my environment. So the mount ends up being terminated.
> 
> Now the simplest solution would be to sleep for some time before doing
> the check to give the xlators time to get ready. But this is
> non-deterministic and isn't something I'm very fond of.
> 
> This turning out to be problematic in my very simple environment, and
> I think it's gonna be a bigger problem in larger more complex
> environments. My environment is,
> - single node
> - single brick volume
> - client is the same node
> - IO transport encryption is on
> - Management transport encryption is on
> - IPv6 enabled in kernel, no actual IPv6 network is in place
> (disabling IPv6 in kernel causes the problem to stop, but I want to
> test with IPv6)
> 
> Does anyone else have ideas on how to fix this? (For now I've disabled
> this check in the script).

I would argue that CHILD_UP is not sufficient for the glusterfs binary
to daemonize/exit. Instead of doing the stat in the mount.glusterfs
script, it would probably be better to move this to the fuse part of the
client xlator-stack. Once the lookup on the mountpoint is done, a valid
return value can be given. This can then be success/fail, and the
mount.glusterfs script can easily check that.

The --verbose mount option (http://review.gluster.org/11469) that
Prasanna is adding would be able to use this as well. Withe --verbose
the process (like mount.nfs) should only output the messages on the
console until the mounting is complete (until fuse-bridge finished the
lookup).

To me, the current 'stat' in the mount.glusterfs script looks more like
a hack than a serious solution.

HTH,
Niels
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel