On Fri, May 6, 2016 at 4:59 PM, Sachidananda URS <surs@xxxxxxxxxx> wrote: > > > On Fri, May 6, 2016 at 4:04 PM, Kaushal M <kshlmster@xxxxxxxxx> wrote: >> >> I'm currently trying to straighten out the encrypted transport >> (SSL/TLS socket) code, and make it more robust, and work well with >> IPv6 in particular [1]. When testing the changes, the mount.glusterfs >> script cause some troubles. >> >> The mount script tries to check if the mount is online by performing a >> stat on the mount point after the glusterfs command returns, and >> umounts if the stat fails. This is a check is racey and doesn't always >> do the right thing. >> >> The check is racey because it could be run before the client >> translators have been able to connect to the bricks. The following >> sequence of events happen when the mount happens, which help explain >> the race. >> >> - mount script runs the glusterfs command >> - mount process fetches the volfile >> - mount process initalizes the graph. The client xlator is also >> initialized now, but the connections aren't started. >> - mount process sends a PARENT_UP event to the graph. The client now >> begins the connection process (portmap first, followed by connecting >> to the brick). It's not guaranteed yet if the connection happened. >> - mount process returns >> - mount script does a stat on mount point to check health >> >> In an environment (like the on I'm testing in) the connection couldn't >> be completed by the time the health check is done. In my environment, >> the client connection sequence is as follows, >> - the portmap connection is started >> - the first address returned for the hostname is a IPv6 address. With >> the IPv6 change that was merged recently name lookups are done with >> AF_UNSPEC, which return IPv6. My envrionment returns v6 addresses >> first for getaddrinfo calls (which I think is the default for a lot of >> environments) >> - the connection fails as glusterd doesn't listen on IPv6 addresses >> (it listens on 0.0.0.0 which v4 only) >> - a reconnection is made with the next address. This takes a while >> because of the encrypted transports. >> - portmap query is done after connection is established and the port >> is obtained >> - the client xlator now reconnects to the obtained port. >> - (same above cycle of connection/reconnection happens) >> - once connection is established, handshakes are done >> - CHILD_UP event is sent >> >> After this point the client xlator becomes usable. >> >> But this is not reached before the mount script does the health check >> in my environment. So the mount ends up being terminated. >> >> Now the simplest solution would be to sleep for some time before doing >> the check to give the xlators time to get ready. But this is >> non-deterministic and isn't something I'm very fond of. >> > > > Have you tried the wait builtin? I don't think it would help. `wait` is used to wait for background processes to complete. But the mount script launches the mount process in the foreground, which forks and quits after the child process gives a return value. `wait` has not much use here. > > -sac _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel