Re: Could be the bug of Glusterfs? The file system is unstable and hang

Rodrigo Azevedo <rodrigoams@xxxxxxxxx> · Thu, 4 Jun 2009 10:04:11 -0300

I am trying a workaround with clients:

volume pnc4
        type protocol/client
        option transport-type tcp
        option remote-host teoria4
        option frame-timeout 180000
        option ping-timeout 1
        option  remote-subvolume dados
end-volume
....
volume replicate
        type cluster/replicate
        subvolumes teoria3 teoria4
end-volume

With server:  I avoid autoscaling in io-threads.

This way the "bailing out frame" error disapeared and the system is stable.

2009/6/3 Alpha Electronics <myitouchs@xxxxxxxxx>:
> We applied the patch mentioned the thread, and use fixed thread count in the
> server config. Unfortunately, we got the same error:
>
> [2009-06-03 04:57:36] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse:
> 22347008: ERR => -1 (Resource temporarily unavailable)
> [2009-06-03 07:55:04] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse:
> 23431094: ERR => -1 (Resource temporarily unavailable)
> [2009-06-03 15:58:25] E [client-protocol.c:292:call_bail] brick1: bailing
> out frame LOOKUP(32) frame sent = 2009-06-03 15:28:23. frame-timeout = 1800
>
> John
>
>
> On Tue, Jun 2, 2009 at 12:25 AM, Shehjar Tikoo <shehjart@xxxxxxxxxxx> wrote:
>>
>> Hi
>>
>> >
>> >     Also, avoid using autoscaling in io-threads for now.
>> >
>> >     -Shehjar
>> >
>> >
>>
>> -Shehjar
>>
>> Alpha Electronics wrote:
>>>
>>> Thanks for looking into this. We do use io-threads. Here is the server
>>> config:
>>> : volume brick1-posix
>>>  2:  type storage/posix
>>>  3:  option directory /mnt/brick1
>>>  4: end-volume
>>>  5:
>>>  6: volume brick2-posix
>>>  7:  type storage/posix
>>>  8:  option directory /mnt/brick2
>>>  9: end-volume
>>>  10:
>>>  11:
>>>  12: volume brick1-locks
>>>  13:   type features/locks
>>>  14:   subvolumes brick1-posix
>>>  15: end-volume
>>>  16:
>>>  17: volume brick2-locks
>>>  18:   type features/locks
>>>  19:   subvolumes brick2-posix
>>>  20: end-volume
>>>  21:
>>>  22: volume brick1
>>>  23:  type performance/io-threads
>>>  24:  option min-threads 16
>>>  25:  option autoscaling on
>>>  26:  subvolumes brick1-locks
>>>  27: end-volume
>>>  28:
>>>  29: volume brick2
>>>  30:  type performance/io-threads
>>>  31:  option min-threads 16
>>>  32:  option autoscaling on
>>>  33:  subvolumes brick2-locks
>>>  34: end-volume
>>>  35:
>>>  36: volume server
>>>  37:  type protocol/server
>>>  38:  option transport-type tcp
>>>  40:  option auth.addr.brick1.allow *
>>>  41:  option auth.addr.brick2.allow *
>>>  42:  subvolumes brick1 brick2
>>>  43: end-volume
>>>  44:
>>>
>>>
>>>
>>> On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart@xxxxxxxxxxx
>>> <mailto:shehjart@xxxxxxxxxxx>> wrote:
>>>
>>>    Alpha Electronics wrote:
>>>
>>>        We are testing the glusterfs before recommending them to
>>>        enterprise clients. We found that the file system always hang
>>>        after running for about 2 days. after killing the server side
>>>        process and then restart, everything goes back to normal.
>>>
>>>
>>>    What is the server config?
>>>    If you're not using io-threads on the server, I suggest you do,
>>>    because it does basic load-balancing to avoid timeouts.
>>>
>>>    Also, avoid using autoscaling in io-threads for now.
>>>
>>>    -Shehjar
>>>
>>>
>>>         Here is the spec and error logged:
>>>        GlusterFS version:  v2.0.1
>>>
>>>        Client volume:
>>>        volume brick_1
>>>         type protocol/client
>>>         option transport-type tcp/client
>>>         option remote-port 7777 # Non-default port
>>>         option remote-host server1
>>>         option remote-subvolume brick
>>>        end-volume
>>>
>>>        volume brick_2
>>>         type protocol/client
>>>         option transport-type tcp/client
>>>         option remote-port 7777 # Non-default port
>>>         option remote-host server2
>>>         option remote-subvolume brick
>>>        end-volume
>>>
>>>        volume bricks
>>>         type cluster/distribute
>>>         subvolumes brick_1 brick_2
>>>        end-volume
>>>
>>>        Error logged on client side through /var/log/glusterfs.log
>>>        [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail]
>>>        brick_1: bailing out frame LK(28) frame sent = 2009-05-29
>>>        14:28:54. frame-timeout = 1800
>>>        [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>>>        glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not
>>>        connected)
>>>        error logged on server
>>>        [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail]
>>>        brick_2: bailing out frame LK(28) frame sent = 2009-05-29
>>>        14:29:05. frame-timeout = 1800
>>>        [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk]
>>>        glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not
>>>        connected)
>>>
>>>        There is error message logged on server side after 1 hour in
>>>        /var/log/messages:
>>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>>        lib/util_sock.c:write_data(564)
>>>        May 29 16:04:16 server2 winbindd[3649]:   write_data: write
>>>        failure. Error = Connection reset by peer
>>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>>        libsmb/clientgen.c:write_socket(158)
>>>        May 29 16:04:16 server2 winbindd[3649]:   write_socket: Error
>>>        writing 104 bytes to socket 18: ERRNO = Connection reset by peer
>>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>>        libsmb/clientgen.c:cli_send_smb(188)
>>>        May 29 16:04:16 server2 winbindd[3649]:   Error writing 104
>>>        bytes to client. -1 (Connection reset by peer)
>>>        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
>>>        libsmb/cliconnect.c:cli_session_setup_spnego(859)
>>>        May 29 16:04:16 server2 winbindd[3649]:   Kinit failed: Cannot
>>>        contact any KDC for requested realm
>>>
>>>
>>>
>>>  ------------------------------------------------------------------------
>>>
>>>        _______________________________________________
>>>        Gluster-devel mailing list
>>>        Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>
>>>        http://lists.nongnu.org/mailman/listinfo/gluster-devel
>>>
>>>
>>>
>>>
>>
>
>
>
>
> _______________________________________________
> Gluster-devel mailing list
> Gluster-devel@xxxxxxxxxx
> http://lists.nongnu.org/mailman/listinfo/gluster-devel
>
>

-- 
Rodrigo Azevedo Moreira da Silva

Departamento de Física
Universidade Federal de Pernambuco
http://www.df.ufpe.br