Re: Could be the bug of Glusterfs? The file system is unstable and hang

Alpha Electronics <myitouchs@xxxxxxxxx> · Wed, 3 Jun 2009 16:48:32 -0500

We applied the patch mentioned the thread, and use fixed thread count in the server config. Unfortunately, we got the same error:

[2009-06-03 04:57:36] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse: 22347008: ERR => -1 (Resource temporarily unavailable)

[2009-06-03 07:55:04] W [fuse-bridge.c:2284:fuse_setlk_cbk] glusterfs-fuse: 23431094: ERR => -1 (Resource temporarily unavailable)
[2009-06-03 15:58:25] E [client-protocol.c:292:call_bail] brick1: bailing out frame LOOKUP(32) frame sent = 2009-06-03 15:28:23. frame-timeout = 1800

John

On Tue, Jun 2, 2009 at 12:25 AM, Shehjar Tikoo <shehjart@xxxxxxxxxxx> wrote:

Hi

>

>     Also, avoid using autoscaling in io-threads for now.

>

>     -Shehjar

>

>

-Shehjar

Alpha Electronics wrote:

Thanks for looking into this. We do use io-threads. Here is the server config:

: volume brick1-posix

  2:  type storage/posix

  3:  option directory /mnt/brick1

  4: end-volume

  5:

  6: volume brick2-posix

  7:  type storage/posix

  8:  option directory /mnt/brick2

  9: end-volume

 10:

 11:

 12: volume brick1-locks

 13:   type features/locks

 14:   subvolumes brick1-posix

 15: end-volume

 16:

 17: volume brick2-locks

 18:   type features/locks

 19:   subvolumes brick2-posix

 20: end-volume

 21:

 22: volume brick1

 23:  type performance/io-threads

 24:  option min-threads 16

 25:  option autoscaling on

 26:  subvolumes brick1-locks

 27: end-volume

 28:

 29: volume brick2

 30:  type performance/io-threads

 31:  option min-threads 16

 32:  option autoscaling on

 33:  subvolumes brick2-locks

 34: end-volume

 35:

 36: volume server

 37:  type protocol/server

 38:  option transport-type tcp

 40:  option auth.addr.brick1.allow *

 41:  option auth.addr.brick2.allow *

 42:  subvolumes brick1 brick2

 43: end-volume

 44:

On Sun, May 31, 2009 at 11:44 PM, Shehjar Tikoo <shehjart@xxxxxxxxxxx <mailto:shehjart@xxxxxxxxxxx>> wrote:

    Alpha Electronics wrote:

        We are testing the glusterfs before recommending them to

        enterprise clients. We found that the file system always hang

        after running for about 2 days. after killing the server side

        process and then restart, everything goes back to normal.

    What is the server config?

    If you're not using io-threads on the server, I suggest you do,

    because it does basic load-balancing to avoid timeouts.

    Also, avoid using autoscaling in io-threads for now.

    -Shehjar

         Here is the spec and error logged:

        GlusterFS version:  v2.0.1

        Client volume:

        volume brick_1

         type protocol/client

         option transport-type tcp/client

         option remote-port 7777 # Non-default port

         option remote-host server1

         option remote-subvolume brick

        end-volume

        volume brick_2

         type protocol/client

         option transport-type tcp/client

         option remote-port 7777 # Non-default port

         option remote-host server2

         option remote-subvolume brick

        end-volume

        volume bricks

         type cluster/distribute

         subvolumes brick_1 brick_2

        end-volume

        Error logged on client side through /var/log/glusterfs.log

        [2009-05-29 14:58:55] E [client-protocol.c:292:call_bail]

        brick_1: bailing out frame LK(28) frame sent = 2009-05-29

        14:28:54. frame-timeout = 1800

        [2009-05-29 14:58:55] W [fuse-bridge.c:2284:fuse_setlk_cbk]

        glusterfs-fuse: 106850788: ERR => -1 (Transport endpoint is not

        connected)

        error logged on server

        [2009-05-29 14:59:15] E [client-protocol.c:292:call_bail]

        brick_2: bailing out frame LK(28) frame sent = 2009-05-29

        14:29:05. frame-timeout = 1800

        [2009-05-29 14:59:15] W [fuse-bridge.c:2284:fuse_setlk_cbk]

        glusterfs-fuse: 106850860: ERR => -1 (Transport endpoint is not

        connected)

        There is error message logged on server side after 1 hour in

        /var/log/messages:

        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]

        lib/util_sock.c:write_data(564)

        May 29 16:04:16 server2 winbindd[3649]:   write_data: write

        failure. Error = Connection reset by peer

        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]

        libsmb/clientgen.c:write_socket(158)

        May 29 16:04:16 server2 winbindd[3649]:   write_socket: Error

        writing 104 bytes to socket 18: ERRNO = Connection reset by peer

        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]

        libsmb/clientgen.c:cli_send_smb(188)

        May 29 16:04:16 server2 winbindd[3649]:   Error writing 104

        bytes to client. -1 (Connection reset by peer)

        May 29 16:04:16 server2 winbindd[3649]: [2009/05/29 16:05:16, 0]

        libsmb/cliconnect.c:cli_session_setup_spnego(859)

        May 29 16:04:16 server2 winbindd[3649]:   Kinit failed: Cannot

        contact any KDC for requested realm

        ------------------------------------------------------------------------

        _______________________________________________

        Gluster-devel mailing list

        Gluster-devel@xxxxxxxxxx <mailto:Gluster-devel@xxxxxxxxxx>

        http://lists.nongnu.org/mailman/listinfo/gluster-devel