Hi,
I'm having rather major problems getting single-process AFR to work
between two servers. When both servers come up, the GlusterFS on both
locks up pretty solid. The processes that try to access the FS
(including ls) seem to get nowhere for a few minutes, and then complete.
But something gets stuck, and glusterfs cannot be killed even with -9!
Another worrying thing is that fuse kernel module ends up having a
reference count even after glusterfs process gets killed (sometimes
killing the remote process that isn't locked up on it's host can break
the locked-up operations and allow for the local glusterfs process to be
killed). So fuse then cannot be unloaded.
This error seems to come up in the logs all the time:
2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the
children are up for locking, returning EIO
2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse:
63: (12) /test => -1 (5)
This implies come kind of a locking issue, but the same error and
conditions also arise when posix locking module is removed.
The configs for the two servers are attached. They are almost identical
to the examples on the glusterfs wiki:
http://www.gluster.org/docs/index.php/AFR_single_process
What am I doing wrong? Have I run into another bug?
Gordan
volume home1-store
type storage/posix
option directory /gluster/home
end-volume
volume home1
type features/posix-locks
subvolumes home1-store
end-volume
volume home2
type protocol/client
option transport-type tcp/client
option remote-host 192.168.3.1
option remote-subvolume home2
end-volume
volume home
type cluster/afr
option read-subvolume home1
subvolumes home1 home2
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes home home1
option auth.ip.home.allow 127.0.0.1
option auth.ip.home1.allow 192.168.*
end-volume
volume home2-store
type storage/posix
option directory /gluster/home
end-volume
volume home2
type features/posix-locks
subvolumes home2-store
end-volume
volume home1
type protocol/client
option transport-type tcp/client
option remote-host 192.168.0.1
option remote-subvolume home1
end-volume
volume home
type cluster/afr
option read-subvolume home2
subvolumes home1 home2
end-volume
volume server
type protocol/server
option transport-type tcp/server
subvolumes home home2
option auth.ip.home.allow 127.0.0.1
option auth.ip.home2.allow 192.168.*
end-volume