Re: Single-process (server and client) AFR problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Krishna,

Is that to say that it's a bug, or am I just using it wrong? Or do I just have a knack for finding dodgy edge cases?

Is there a workaround?

I have just reconfigured my servers to do 2-process client-side AFR (i.e. the traditional approach), and that works fine. But having single-process server-side AFR would be more efficient, and simplify my config somewhat.

Thanks.

Gordan

On Tue, 20 May 2008, Krishna Srinivas wrote:

In this setup, home1 is sending CHILD_UP event to "server" xlator instead
of the "home" afr xlator. (and home2 is not up) This makes afr think none
of its subvols are up. We will fix it to handle this situation.

Thanks
Krishna

On Tue, May 20, 2008 at 2:00 PM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
      This is with release 1.3.9.

      Not much more that seems relevant turns up in the logs with -L DEBUG (DNS
      chatter, mentions that the 2nd server isn't talking (glusterfs is switched
      off on it because that causes the lock-up).

      This gets logged when I try to cat ~/.bashrc:

      2008-05-20 09:14:08 D [fuse-bridge.c:375:fuse_entry_cbk] glusterfs-fuse:
      39: (34) /gordan/.bashrc =>
       60166157
      2008-05-20 09:14:08 D [inode.c:577:__create_inode] fuse/inode: create
      inode(60166157)
      2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating
      inode(60166157), lru=7/102
      4
      2008-05-20 09:14:08 D [inode.c:367:__active_inode] fuse/inode: activating
      inode(60166157), lru=7/102
      4
      2008-05-20 09:14:08 D [fuse-bridge.c:1517:fuse_open] glusterfs-fuse: 40:
      OPEN /gordan/.bashrc
      2008-05-20 09:14:08 E [afr.c:1985:afr_selfheal] home: none of the children
      are up for locking, retur
      ning EIO
2008-05-20 09:14:08 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse: 40:
(12) /gordan/.bashrc => -1
 (5)

On the command line, I get back "Input/output error". I can ls the files,
but cannot actually read them.

This is with only the first server up. Same happens when I mount home.vol
via fstab or via something like:
glusterfs -f /etc/glusterfs/home.vol /home

I have also reduced the config (single process, intended for servers) to a
bare minimum (removed posix lock layer), to get to the bottom of it, but I
cannot get any reads to work:

volume home1
       type storage/posix
       option directory /gluster/home
end-volume

volume home2
       type protocol/client
       option transport-type tcp/client
       option remote-host 192.168.3.1
       option remote-subvolume home2
end-volume

volume home
       type cluster/afr
       option read-subvolume home1
       subvolumes home1 home2
end-volume

volume server
       type protocol/server
       option transport-type tcp/server
       subvolumes home home1
       option auth.ip.home.allow 127.0.0.1,192.168.*
       option auth.ip.home1.allow 127.0.0.1,192.168.*
end-volume

On a related node, if single-process is used, how does GlusterFS know which
volume to mount? For example, if it is trying to mount the protocol/client
volume (home2), the obviously, that won't work because the 2nd server is
not up. If it is mounting the protocol/server volume, then is it trying to
mount home or home1? Or does it mount the outermost volume that _isn't_ a
protocol/[client|server] (which is "home" in this case)?

Thanks.

Gordan

On Tue, 20 May 2008 13:18:07 +0530, Krishna Srinivas
<krishna@xxxxxxxxxxxxx> wrote:
> Gordan,
>
> Which patch set is this? Can you run glusterfs server side with "-L
DEBUG"
> and send the logs?
>
> Thanks
> Krishna
>
> On Tue, May 20, 2008 at 1:56 AM, Gordan Bobic <gordan@xxxxxxxxxx> wrote:
>> Hi,
>>
>> I'm having rather major problems getting single-process AFR to work
> between
>> two servers. When both servers come up, the GlusterFS on both locks up
>> pretty solid. The processes that try to access the FS (including ls)
> seem to
>> get nowhere for a few minutes, and then complete. But something gets
> stuck,
>> and glusterfs cannot be killed even with -9!
>>
>> Another worrying thing is that fuse kernel module ends up having a
> reference
>> count even after glusterfs process gets killed (sometimes killing the
> remote
>> process that isn't locked up on it's host can break the locked-up
> operations
>> and allow for the local glusterfs process to be killed). So fuse then
> cannot
>> be unloaded.
>>
>> This error seems to come up in the logs all the time:
>> 2008-05-19 20:57:17 E [afr.c:1985:afr_selfheal] home: none of the
> children
>> are up for locking, returning EIO
>> 2008-05-19 20:57:17 E [fuse-bridge.c:692:fuse_fd_cbk] glusterfs-fuse:
> 63:
>> (12) /test => -1 (5)
>>
>> This implies come kind of a locking issue, but the same error and
> conditions
>> also arise when posix locking module is removed.
>>
>> The configs for the two servers are attached. They are almost identical
> to
>> the examples on the glusterfs wiki:
>>
>> http://www.gluster.org/docs/index.php/AFR_single_process
>>
>> What am I doing wrong? Have I run into another bug?
>>
>> Gordan
>>
>> volume home1-store
>>        type storage/posix
>>        option directory /gluster/home
>> end-volume
>>
>> volume home1
>>        type features/posix-locks
>>        subvolumes home1-store
>> end-volume
>>
>> volume home2
>>        type protocol/client
>>        option transport-type tcp/client
>>        option remote-host 192.168.3.1
>>        option remote-subvolume home2
>> end-volume
>>
>> volume home
>>        type cluster/afr
>>        option read-subvolume home1
>>        subvolumes home1 home2
>> end-volume
>>
>> volume server
>>        type protocol/server
>>        option transport-type tcp/server
>>        subvolumes home home1
>>        option auth.ip.home.allow 127.0.0.1
>>        option auth.ip.home1.allow 192.168.*
>> end-volume
>>
>> volume home2-store
>>        type storage/posix
>>        option directory /gluster/home
>> end-volume
>>
>> volume home2
>>        type features/posix-locks
>>        subvolumes home2-store
>> end-volume
>>
>> volume home1
>>        type protocol/client
>>        option transport-type tcp/client
>>        option remote-host 192.168.0.1
>>        option remote-subvolume home1
>> end-volume
>>
>> volume home
>>        type cluster/afr
>>        option read-subvolume home2
>>        subvolumes home1 home2
>> end-volume
>>
>> volume server
>>        type protocol/server
>>        option transport-type tcp/server
>>        subvolumes home home2
>>        option auth.ip.home.allow 127.0.0.1
>>        option auth.ip.home2.allow 192.168.*
>> end-volume
>>
>> _______________________________________________
>> Gluster-devel mailing list
>> Gluster-devel@xxxxxxxxxx
>> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>
>>



_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
http://lists.nongnu.org/mailman/listinfo/gluster-devel




[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux