Re: Could be the bug of Glusterfs? The file system is unstable and hang

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We applied the patch mentioned the thread, and use fixed thread count in the server config. Unfortunately, we got the same error:

[2009-06-03 04:57:36] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse: 22347008: ERR => -1 (Resource temporarily unavailable)
[2009-06-03 07:55:04] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse: 23431094: ERR => -1 (Resource temporarily unavailable)
[2009-06-03 15:58:25] E [client-protocol.c:292:call_bail] brick1: bailing out frame LOOKUP(32) frame sent = 2009-06-03 15:28:23. frame-timeout = 1800

John


On Tue, Jun 2, 2009 at 12:25 AM, Shehjar Tikoo <shehjart@xxxxxxxxxxx> wrote:

Hi


>
>     Also, avoid using autoscaling in io-threads for now.
>
>     -Shehjar
>
>

-Shehjar

Alpha Electronics wrote:
Thanks for looking into this. We do use io-threads. Here is the server config:
: volume brick1-posix
 2:  type storage/posix
 3:  option directory /mnt/brick1
 4: end-volume
 5:
 6: volume brick2-posix
 7:  type storage/posix
 8:  option directory /mnt/brick2
 9: end-volume
 10:
 11:
 12: volume brick1-locks
 13:   type features/locks
 14:   subvolumes brick1-posix
 15: end-volume
 16:
 17: volume brick2-locks
 18:   type features/locks
 19:   subvolumes brick2-posix
 20: end-volume
 21:
 22: volume brick1
 23:  type performance/io-threads
 24:  option min-threads 16
 25:  option autoscaling on
 26:  subvolumes brick1-locks
 27: end-volume
 28:
 29: volume brick2
 30:  type performance/io-threads
 31:  option min-threads 16
 32:  option autoscaling on
 33:  subvolumes brick2-locks
 34: end-volume
 35:
 36: volume server
 37:  type protocol/server
 38:  option transport-type tcp
 40:  option auth.addr.brick1.allow *
 41:  option auth.addr.brick2.allow *
 42:  subvolumes brick1 brick2
 43: end-volume
 44:



On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart@xxxxxxxxxxx <mailto:shehjart@xxxxxxxxxxx>> wrote:

   Alpha Electronics wrote:

       We are testing the glusterfs before recommending them to
       enterprise clients. We found that the file system always hang
       after running for about 2 days. after killing the server side
       process and then restart, everything goes back to normal.


   What is the server config?
   If you're not using io-threads on the server, I suggest you do,
   because it does basic load-balancing to avoid timeouts.

   Also, avoid using autoscaling in io-threads for now.

   -Shehjar


        Here is the spec and error logged:
       GlusterFS version:  v2.0.1

       Client volume:
       volume brick_1
        type protocol/client
        option transport-type tcp/client
        option remote-port 7777 # Non-default port
        option remote-host server1
        option remote-subvolume brick
       end-volume

       volume brick_2
        type protocol/client
        option transport-type tcp/client
        option remote-port 7777 # Non-default port
        option remote-host server2
        option remote-subvolume brick
       end-volume

       volume bricks
        type cluster/distribute
        subvolumes brick_1 brick_2
       end-volume

       Error logged on client side through /var/log/glusterfs.log
       [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail]
       brick_1: bailing out frame LK(28) frame sent = 2009-05-29
       14:28:54. frame-timeout = 1800
       [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk]
       glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not
       connected)
       error logged on server
       [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail]
       brick_2: bailing out frame LK(28) frame sent = 2009-05-29
       14:29:05. frame-timeout = 1800
       [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk]
       glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not
       connected)

       There is error message logged on server side after 1 hour in
       /var/log/messages:
       May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
       lib/util_sock.c:write_data(564)
       May 29 16:04:16 server2 winbindd[3649]:   write_data: write
       failure. Error = Connection reset by peer
       May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
       libsmb/clientgen.c:write_socket(158)
       May 29 16:04:16 server2 winbindd[3649]:   write_socket: Error
       writing 104 bytes to socket 18: ERRNO = Connection reset by peer
       May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
       libsmb/clientgen.c:cli_send_smb(188)
       May 29 16:04:16 server2 winbindd[3649]:   Error writing 104
       bytes to client. -1 (Connection reset by peer)
       May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]
       libsmb/cliconnect.c:cli_session_setup_spnego(859)
       May 29 16:04:16 server2 winbindd[3649]:   Kinit failed: Cannot
       contact any KDC for requested realm


       ------------------------------------------------------------------------

       _______________________________________________
       Gluster-devel mailing list
       Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>





[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux