On Fri, May 06, 2016 at 04:04:02PM +0530, Kaushal M wrote: > I'm currently trying to straighten out the encrypted transport > (SSL/TLS socket) code, and make it more robust, and work well with > IPv6 in particular [1]. When testing the changes, the mount.glusterfs > script cause some troubles. > > The mount script tries to check if the mount is online by performing a > stat on the mount point after the glusterfs command returns, and > umounts if the stat fails. This is a check is racey and doesn't always > do the right thing. > > The check is racey because it could be run before the client > translators have been able to connect to the bricks. The following > sequence of events happen when the mount happens, which help explain > the race. > > - mount script runs the glusterfs command > - mount process fetches the volfile > - mount process initalizes the graph. The client xlator is also > initialized now, but the connections aren't started. > - mount process sends a PARENT_UP event to the graph. The client now > begins the connection process (portmap first, followed by connecting > to the brick). It's not guaranteed yet if the connection happened. > - mount process returns > - mount script does a stat on mount point to check health > > In an environment (like the on I'm testing in) the connection couldn't > be completed by the time the health check is done. In my environment, > the client connection sequence is as follows, > - the portmap connection is started > - the first address returned for the hostname is a IPv6 address. With > the IPv6 change that was merged recently name lookups are done with > AF_UNSPEC, which return IPv6. My envrionment returns v6 addresses > first for getaddrinfo calls (which I think is the default for a lot of > environments) > - the connection fails as glusterd doesn't listen on IPv6 addresses > (it listens on 0.0.0.0 which v4 only) > - a reconnection is made with the next address. This takes a while > because of the encrypted transports. > - portmap query is done after connection is established and the port > is obtained > - the client xlator now reconnects to the obtained port. > - (same above cycle of connection/reconnection happens) > - once connection is established, handshakes are done > - CHILD_UP event is sent > > After this point the client xlator becomes usable. > > But this is not reached before the mount script does the health check > in my environment. So the mount ends up being terminated. > > Now the simplest solution would be to sleep for some time before doing > the check to give the xlators time to get ready. But this is > non-deterministic and isn't something I'm very fond of. > > This turning out to be problematic in my very simple environment, and > I think it's gonna be a bigger problem in larger more complex > environments. My environment is, > - single node > - single brick volume > - client is the same node > - IO transport encryption is on > - Management transport encryption is on > - IPv6 enabled in kernel, no actual IPv6 network is in place > (disabling IPv6 in kernel causes the problem to stop, but I want to > test with IPv6) > > Does anyone else have ideas on how to fix this? (For now I've disabled > this check in the script). I would argue that CHILD_UP is not sufficient for the glusterfs binary to daemonize/exit. Instead of doing the stat in the mount.glusterfs script, it would probably be better to move this to the fuse part of the client xlator-stack. Once the lookup on the mountpoint is done, a valid return value can be given. This can then be success/fail, and the mount.glusterfs script can easily check that. The --verbose mount option (http://review.gluster.org/11469) that Prasanna is adding would be able to use this as well. Withe --verbose the process (like mount.nfs) should only output the messages on the console until the mounting is complete (until fuse-bridge finished the lookup). To me, the current 'stat' in the mount.glusterfs script looks more like a hack than a serious solution. HTH, Niels
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel