Gordon, It is a bug in the way xlators notify UP/DOWN status to other xlators. Thanks for notifying it to us :) Workaround is having a separate process for "protocol/server" xlator. Krishna On Tue, May 20, 2008 at 3:51 PM, <gordan@xxxxxxxxxx> wrote: > Krishna, > > Is that to say that it's a bug, or am I just using it wrong? Or do I just > have a knack for finding dodgy edge cases? > > Is there a workaround? > > I have just reconfigured my servers to do 2-process client-side AFR (i.e. > the traditional approach), and that works fine. But having single-process > server-side AFR would be more efficient, and simplify my config somewhat. > > Thanks. > > Gordan > > > On Tue, 20 May 2008, Krishna Srinivas wrote: > > In this setup, home1 is sending CHILD_UP event to "server" xlator instead >> of the "home" afr xlator. (and home2 is not up) This makes afr think none >> of its subvols are up. We will fix it to handle this situation. >> >> Thanks >> Krishna >> >> On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote: >> This is with release 1.3.9. >> >> Not much more that seems relevant turns up in the logs with -L DEBUG >> (DNS >> chatter, mentions that the 2nd server isn't talking (glusterfs is >> switched >> off on it because that causes the lock-up). >> >> This gets logged when I try to cat ~/.bashrc: >> >> 2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk] >> glusterfs-fuse: >> 39: (34) /gordan/.bashrc => >> 60166157 >> 2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create >> inode(60166157) >> 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: >> activating >> inode(60166157), lru=7/102 >> 4 >> 2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: >> activating >> inode(60166157), lru=7/102 >> 4 >> 2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse: >> 40: >> OPEN /gordan/.bashrc >> 2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the >> children >> are up for locking, retur >> ning EIO >> 2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40: >> (12) /gordan/.bashrc => -1 >> (5) >> >> On the command line, I get back "Input/output error". I can ls the files, >> but cannot actually read them. >> >> This is with only the first server up. Same happens when I mount home.vol >> via fstab or via something like: >> glusterfs -f /etc/glusterfs/home.vol /home >> >> I have also reduced the config (single process, intended for servers) to a >> bare minimum (removed posix lock layer), to get to the bottom of it, but I >> cannot get any reads to work: >> >> volume home1 >> type storage/posix >> option directory /gluster/home >> end-volume >> >> volume home2 >> type protocol/client >> option transport-type tcp/client >> option remote-host 192.168.3.1 >> option remote-subvolume home2 >> end-volume >> >> volume home >> type cluster/afr >> option read-subvolume home1 >> subvolumes home1 home2 >> end-volume >> >> volume server >> type protocol/server >> option transport-type tcp/server >> subvolumes home home1 >> option auth.ip.home.allow 127.0.0.1,192.168.* >> option auth.ip.home1.allow 127.0.0.1,192.168.* >> end-volume >> >> On a related node, if single-process is used, how does GlusterFS know >> which >> volume to mount? For example, if it is trying to mount the protocol/client >> volume (home2), the obviously, that won't work because the 2nd server is >> not up. If it is mounting the protocol/server volume, then is it trying to >> mount home or home1? Or does it mount the outermost volume that _isn't_ a >> protocol/[client|server] (which is "home" in this case)? >> >> Thanks. >> >> Gordan >> >> On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas >> <krishna@xxxxxxxxxxxxx> wrote: >> > Gordan, >> > >> > Which patch set is this? Can you run glusterfs server side with "-L >> DEBUG" >> > and send the logs? >> > >> > Thanks >> > Krishna >> > >> > On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan@xxxxxxxxxx> >> wrote: >> >> Hi, >> >> >> >> I'm having rather major problems getting single-process AFR to work >> > between >> >> two servers. When both servers come up, the GlusterFS on both locks up >> >> pretty solid. The processes that try to access the FS (including ls) >> > seem to >> >> get nowhere for a few minutes, and then complete. But something gets >> > stuck, >> >> and glusterfs cannot be killed even with -9! >> >> >> >> Another worrying thing is that fuse kernel module ends up having a >> > reference >> >> count even after glusterfs process gets killed (sometimes killing the >> > remote >> >> process that isn't locked up on it's host can break the locked-up >> > operations >> >> and allow for the local glusterfs process to be killed). So fuse then >> > cannot >> >> be unloaded. >> >> >> >> This error seems to come up in the logs all the time: >> >> 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the >> > children >> >> are up for locking, returning EIO >> >> 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: >> > 63: >> >> (12) /test => -1 (5) >> >> >> >> This implies come kind of a locking issue, but the same error and >> > conditions >> >> also arise when posix locking module is removed. >> >> >> >> The configs for the two servers are attached. They are almost identical >> > to >> >> the examples on the glusterfs wiki: >> >> >> >> http://www.gluster.org/docs/index.php/AFR_single_process >> >> >> >> What am I doing wrong? Have I run into another bug? >> >> >> >> Gordan >> >> >> >> volume home1-store >> >> type storage/posix >> >> option directory /gluster/home >> >> end-volume >> >> >> >> volume home1 >> >> type features/posix-locks >> >> subvolumes home1-store >> >> end-volume >> >> >> >> volume home2 >> >> type protocol/client >> >> option transport-type tcp/client >> >> option remote-host 192.168.3.1 >> >> option remote-subvolume home2 >> >> end-volume >> >> >> >> volume home >> >> type cluster/afr >> >> option read-subvolume home1 >> >> subvolumes home1 home2 >> >> end-volume >> >> >> >> volume server >> >> type protocol/server >> >> option transport-type tcp/server >> >> subvolumes home home1 >> >> option auth.ip.home.allow 127.0.0.1 >> >> option auth.ip.home1.allow 192.168.* >> >> end-volume >> >> >> >> volume home2-store >> >> type storage/posix >> >> option directory /gluster/home >> >> end-volume >> >> >> >> volume home2 >> >> type features/posix-locks >> >> subvolumes home2-store >> >> end-volume >> >> >> >> volume home1 >> >> type protocol/client >> >> option transport-type tcp/client >> >> option remote-host 192.168.0.1 >> >> option remote-subvolume home1 >> >> end-volume >> >> >> >> volume home >> >> type cluster/afr >> >> option read-subvolume home2 >> >> subvolumes home1 home2 >> >> end-volume >> >> >> >> volume server >> >> type protocol/server >> >> option transport-type tcp/server >> >> subvolumes home home2 >> >> option auth.ip.home.allow 127.0.0.1 >> >> option auth.ip.home2.allow 192.168.* >> >> end-volume >> >> >> >> _______________________________________________ >> >> Gluster-devel mailing list >> >> Gluster-devel@xxxxxxxxxx >> >> http://lists.nongnu.org/mailman/listinfo/gluster-devel >> >> >> >> >> >> >> >> _______________________________________________ >> Gluster-devel mailing list >> Gluster-devel@xxxxxxxxxx >> http://lists.nongnu.org/mailman/listinfo/gluster-devel >> >> >> >> > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel > >