The continuing story ...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> For me, it does not clear after 3 mins or 3 hours. I restarted the machines
> at midnight, and the first time I tried again was around 1pm the next day
> (13 hours). I easily recognize the symptoms as the /bin/mount remains in the
> process tree. I can't get a strace -p on the /bin/mount process since it is
> frozen. The glusterfsd process is not frozen - the glusterfs process seems
> to be waiting on /bin/mount to complete. The only way to unfreeze the mount
> seems to be to kill -9 /bin/mount (regular kill does not work), at which the
> mount point goes into the disconnected state, and it is recovered using
> unmount / remount. I tried to track down the problem before, but became
> confused, because glusterfs seems to do it's own FUSE mount management
> rather than using the standard (for Linux anyways?) FUSE user space
> libraries. If my memory is correct - it seems like the process is: I run
> mount, the mount runs /sbin/mount.glusterfs, which runs glusterfs, which
> runs /bin/mount with the full options?

This looks like a different issue from what I previously described. If
you are certain that the /bin/mount which was hung was the one which
glusterfs had spawned, then the issue might be something else. The way
fuse based filesystems mount is 2-fold. The first 'mount -t glusterfs'
starts /bin/mount which in turn calls /sbin/mount.glusterfs. This
starts the glusterfs binary, which at the time of initializing the
fuse xlator results in a call to fuse_mount() call of libfuse. libfuse
in-turn does the second phase of mounting by calling mount -t fuse an
in turn /sbin/mount.fuse. I'm trying to think how the three machines
rebooting together can be correlated to the second phase fuse mount to
hang.

> This is where I discovered the other issue where the 'mount
> /gluster/mountpoint' can return before the mount point is completely set up,
> introducing a race where a user can access the mount point and see an error
> or an empty directory before seeing the actual contents. I don't know if
> these are related or separate issues.

Your are right. A fix for this is already under way.

Avati


[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux