On 09/18/2009 02:28 PM, Anand Avati wrote: >> For me, it does not clear after 3 mins or 3 hours. I restarted the machines >> at midnight, and the first time I tried again was around 1pm the next day >> (13 hours). I easily recognize the symptoms as the /bin/mount remains in the >> process tree. I can't get a strace -p on the /bin/mount process since it is >> frozen. The glusterfsd process is not frozen - the glusterfs process seems >> to be waiting on /bin/mount to complete. The only way to unfreeze the mount >> seems to be to kill -9 /bin/mount (regular kill does not work), at which the >> mount point goes into the disconnected state, and it is recovered using >> unmount / remount. I tried to track down the problem before, but became >> confused, because glusterfs seems to do it's own FUSE mount management >> rather than using the standard (for Linux anyways?) FUSE user space >> libraries. If my memory is correct - it seems like the process is: I run >> mount, the mount runs /sbin/mount.glusterfs, which runs glusterfs, which >> runs /bin/mount with the full options? >> > This looks like a different issue from what I previously described. If > you are certain that the /bin/mount which was hung was the one which > glusterfs had spawned, then the issue might be something else. The way > fuse based filesystems mount is 2-fold. The first 'mount -t glusterfs' > starts /bin/mount which in turn calls /sbin/mount.glusterfs. This > starts the glusterfs binary, which at the time of initializing the > fuse xlator results in a call to fuse_mount() call of libfuse. libfuse > in-turn does the second phase of mounting by calling mount -t fuse an > in turn /sbin/mount.fuse. I'm trying to think how the three machines > rebooting together can be correlated to the second phase fuse mount to > hang. > Thanks for looking at this. The above is compatible with my thinking. I'll see about getting output to prove it. Cheers, mark -- Mark Mielke<mark at mielke.cc>