Gluster Volume hangs (version 3.2.5)

mohitanchlia at gmail.com (Mohit Anchlia) · Wed, 14 Mar 2012 21:32:27 -0700

Is this a new setup and used to work before? How is the CPU, memory etc?
Also, what do you see in gluster nodes?

On Wed, Mar 14, 2012 at 7:33 PM, Alessio Checcucci <
alessio.checcucci at gmail.com> wrote:

> Dear All,
> we are facing a problem in our computer room, we have 6 servers that act
> like bricks for GlusterFS, the servers are configured in the following way:
>
> OS: Centos 6.2 x86_64
> Kernel: 2.6.32-220.4.2.el6.x86_64
>
> Gluster RPM packages:
>  glusterfs-core-3.2.5-2.el6.x86_64
> glusterfs-rdma-3.2.5-2.el6.x86_64
> glusterfs-geo-replication-3.2.5-2.el6.x86_64
> glusterfs-fuse-3.2.5-2.el6.x86_64
>
> Each one is contributing a XFS filesystem to the global volume, the
> transport mechanism is RDMA:
>
> gluster volume create HPC_data transport rdma pleiades01:/data
> pleiades02:/data pleiades03:/data pleiades04:/data pleiades05:/data
> pleiades06:/data
>
> Each server mounts, using the fuse driver, the volume on a dedicated mount
> point according to the following fstab:
>
> pleiades01:/HPC_data        /HPCdata                glusterfs
> defaults,_netdev 0 0
>
> We are running mongodb on top of the Gluster volume for performance
> testing and speed is definitely high. Unfortunately when we run a large
> mongoimport job after short time from the beginning the GlusterFS volume
> hangs completely and is inaccessible from any node. The following error is
> logged after some time in /var/log/messages:
>
> Mar  8 08:16:03 pleiades03 kernel: INFO: task mongod:5508 blocked for more
> than 120 seconds.
> Mar  8 08:16:03 pleiades03 kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Mar  8 08:16:03 pleiades03 kernel: mongod        D 0000000000000007     0
>  5508      1 0x00000000
> Mar  8 08:16:03 pleiades03 kernel: ffff881709b95de8 0000000000000086
> 0000000000000000 0000000000000008
> Mar  8 08:16:03 pleiades03 kernel: ffff881709b95d68 ffffffff81090a7f
> ffff8816b6974cc0 0000000000000000
> Mar  8 08:16:03 pleiades03 kernel: ffff8817fdd81af8 ffff881709b95fd8
> 000000000000f4e8 ffff8817fdd81af8
> Mar  8 08:16:03 pleiades03 kernel: Call Trace:
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090a7f>] ?
> wake_up_bit+0x2f/0x40
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090d7e>] ?
> prepare_to_wait+0x4e/0x80
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112c6b5>]
> fuse_set_nowrite+0xa5/0xe0 [fuse]
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090a90>] ?
> autoremove_wake_function+0x0/0x40
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112fd48>]
> fuse_fsync_common+0xa8/0x180 [fuse]
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112fe30>]
> fuse_fsync+0x10/0x20 [fuse]
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff811a52d1>]
> vfs_fsync_range+0xa1/0xe0
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff811a537d>] vfs_fsync+0x1d/0x20
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81144421>]
> sys_msync+0x151/0x1e0
> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff8100b0f2>]
> system_call_fastpath+0x16/0x1b
>
> Any attempt to access the volume from any node is fruitless until the
> mongodb process is killed, the sessions accessing the /HPCdata path gets
> freezed on any node.
> Anyway a complete stop (force) and start of the volume is needed to have
> it back operational.
> The situation can be reproduced at will.
> Is there anybody able to help us? Could we collect more pieces of
> information to help diagnosing the problem?
>
> Thanks a lot
> Alessio
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20120314/14380261/attachment.htm>