Re: mount.glusterfs health check troubles - help appreciated

Kaushal M <kshlmster@xxxxxxxxx> · Mon, 9 May 2016 10:21:42 +0530

On Sat, May 7, 2016 at 3:03 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
> On Fri, May 06, 2016 at 04:04:02PM +0530, Kaushal M wrote:
>> I'm currently trying to straighten out the encrypted transport
>> (SSL/TLS socket) code, and make it more robust, and work well with
>> IPv6 in particular [1]. When testing the changes, the mount.glusterfs
>> script cause some troubles.
>>
>> The mount script tries to check if the mount is online by performing a
>> stat on the mount point after the glusterfs command returns, and
>> umounts if the stat fails. This is a check is racey and doesn't always
>> do the right thing.
>>
>> The check is racey because it could be run before the client
>> translators have been able to connect to the bricks. The following
>> sequence of events happen when the mount happens, which help explain
>> the race.
>>
>> - mount script runs the glusterfs command
>> - mount process fetches the volfile
>> - mount process initalizes the graph. The client xlator is also
>> initialized now, but the connections aren't started.
>> - mount process sends a PARENT_UP event to the graph. The client now
>> begins the connection process (portmap first, followed by connecting
>> to the brick). It's not guaranteed yet if the connection happened.
>> - mount process returns
>> - mount script does a stat on mount point to check health
>>
>> In an environment (like the on I'm testing in) the connection couldn't
>> be completed by the time the health check is done. In my environment,
>> the client connection sequence is as follows,
>> - the portmap connection is started
>>  - the first address returned for the hostname is a IPv6 address. With
>> the IPv6 change that was merged recently name lookups are done with
>> AF_UNSPEC, which return IPv6. My envrionment returns v6 addresses
>> first for getaddrinfo calls (which I think is the default for a lot of
>> environments)
>>  - the connection fails as glusterd doesn't listen on IPv6 addresses
>> (it listens on 0.0.0.0 which v4 only)
>>  - a reconnection is made with the next address. This takes a while
>> because of the encrypted transports.
>>  - portmap query is done after connection is established and the port
>> is obtained
>> - the client xlator now reconnects to the obtained port.
>>  - (same above cycle of connection/reconnection happens)
>> - once connection is established, handshakes are done
>> - CHILD_UP event is sent
>>
>> After this point the client xlator becomes usable.
>>
>> But this is not reached before the mount script does the health check
>> in my environment. So the mount ends up being terminated.
>>
>> Now the simplest solution would be to sleep for some time before doing
>> the check to give the xlators time to get ready. But this is
>> non-deterministic and isn't something I'm very fond of.
>>
>> This turning out to be problematic in my very simple environment, and
>> I think it's gonna be a bigger problem in larger more complex
>> environments. My environment is,
>> - single node
>> - single brick volume
>> - client is the same node
>> - IO transport encryption is on
>> - Management transport encryption is on
>> - IPv6 enabled in kernel, no actual IPv6 network is in place
>> (disabling IPv6 in kernel causes the problem to stop, but I want to
>> test with IPv6)
>>
>> Does anyone else have ideas on how to fix this? (For now I've disabled
>> this check in the script).
>
> I would argue that CHILD_UP is not sufficient for the glusterfs binary
> to daemonize/exit. Instead of doing the stat in the mount.glusterfs
> script, it would probably be better to move this to the fuse part of the
> client xlator-stack. Once the lookup on the mountpoint is done, a valid
> return value can be given. This can then be success/fail, and the
> mount.glusterfs script can easily check that.

It could be if the daemon returns after the event happens. With the
event, we can atleast be sure that the xlator
is ready to process requests. Any failure that happens after this
would be a valid failure.
But the process is returning before the CHILD_UP event, leading to
requests being sent to a xlator that's not ready.

It's my opinion as well that the mount process needs to do a better
job of returning its status.
This require changes to glusterfsd and the fuse xlator. I don't have
an exact picture of how this
would be done, but I think it's going to be slightly complex. And I
don't think I can put in additional time
doing this. I'm willing to assist any volunteer willing to solve this.

>
> The --verbose mount option (http://review.gluster.org/11469) that
> Prasanna is adding would be able to use this as well. Withe --verbose
> the process (like mount.nfs) should only output the messages on the
> console until the mounting is complete (until fuse-bridge finished the
> lookup).
>
> To me, the current 'stat' in the mount.glusterfs script looks more like
> a hack than a serious solution.

This is what other developers I've spoken to believe as well. The
health check is a hack.

Besides solutions to the mount script problem, I've been looking at
fixing the other cause of the troubles.
The reconnection happens because the glusterfs daemons default to
listening on IPv4 interfaces only.
If glusterfs clients are going to try to use IPv6 addresses if
available by default, the servers should also listen on IPv6 by
default.

I tested this out by changing the default tcp socket listeners to be
IPv6 (using in6addr_any which listens on all available interfaces, on
both v4 and v6).
This works really well and I'll be pushing this change as well.

~kaushal

>
> HTH,
> Niels
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel