Gluster Volume hangs (version 3.2.5)

alessio.checcucci at gmail.com (Alessio Checcucci) · Fri, 16 Mar 2012 14:19:00 +0800

Dear Mohit,
I have performed some extensive tests deliberately I/O overloading the filesystem and I was able to reproduce the problem even not using mongodb. I drilled down into the IB fabric and I detected some strange switch behaviour, as you suggested. I am carrying on the investigation on this side and I will report as soon as I will have been able to shed some light on.

Thanks a lot
Alessio 

On 15/03/2012, at 22:54 , Mohit Anchlia wrote:

> Can you break your CSV in small chunks and try? It appears that network is somehow getting overwhelmed. Have you checked switches for any errors?
> 
> On Wed, Mar 14, 2012 at 11:39 PM, Alessio Checcucci <alessio.checcucci at gmail.com> wrote:
> Dear Mohit,
> thanks for your answer. The setup is pretty new, we have configured it one month ago more or less and the iozone tests we performed never highlighted any problem.
> The servers are SGI machines based on Supermicro hardware, each one features: 
> 2 Xeon X5650 6-cores cpus
> 96GB of RAM 
> two Intel Gigabit interfaces 
> 1 Mellanox ConnectX-2 IB HCA
> 1 LSA 1068E SATA RAID controller
> 6 Seagate ST32000644NS 2TB HDDs
> 
> The Gluster nodes work quite smoothly, they act both as bricks and as clients, mounting the Gluster filesystem by means of the fuse driver. Unfortunately when we run the mongo import (from a huge CSV file) after some time (minutes) all the mounts become completely freezed and the fuse error (with related timeout) I reported in my first message is logged. 
> Looking at the volume log in the Gluster bricks we can see the following messages:
> 
> [2012-03-15 04:45:07.352455] E [rdma.c:3415:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, post->buf = 0x2b08000, wc.byte_len = 0, post->reused = 2
> [2012-03-15 04:45:07.352510] E [rdma.c:3423:rdma_handle_failed_send_completion] 0-rdma: connection between client and server not working. check by running 'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the problem persists.
> [2012-03-15 04:45:07.352535] E [rdma.c:3415:rdma_handle_failed_send_completion] 0-rpc-transport/rdma: send work request on `mlx4_0' returned error wc.status = 12, wc.vendor_err = 129, post->buf = 0x2b0a000, wc.byte_len = 0, post->reused = 5
> [2012-03-15 04:45:07.352545] E [rdma.c:3423:rdma_handle_failed_send_completion] 0-rdma: connection between client and server not working. check by running 'ibv_srq_pingpong'. also make sure subnet manager is running (eg: 'opensm'), or check if rdma port is valid (or active) by running 'ibv_devinfo'. contact Gluster Support Team if the problem persists.
> [2012-03-15 04:45:07.352900] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7fda3f424568] (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7fda3f423cfd] (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fda3f423c5e]))) 0-HPC_data-client-4: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-15 04:45:03.336837
> [2012-03-15 04:45:07.352942] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
> [2012-03-15 04:45:07.352956] I [client.c:1883:client_rpc_notify] 0-HPC_data-client-4: disconnected
> [2012-03-15 04:45:07.353301] E [rpc-clnt.c:341:saved_frames_unwind] (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_notify+0x78) [0x7fda3f424568] (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7d) [0x7fda3f423cfd] (-->/opt/glusterfs/3.2.5/lib64/libgfrpc.so.0(saved_frames_destroy+0xe) [0x7fda3f423c5e]))) 0-HPC_data-client-5: forced unwinding frame type(GlusterFS 3.1) op(LOOKUP(27)) called at 2012-03-15 04:45:03.336880
> [2012-03-15 04:45:07.353317] E [client3_1-fops.c:2228:client3_1_lookup_cbk] 0-glusterfs: remote operation failed: Transport endpoint is not connected
> [2012-03-15 04:45:07.353326] I [dht-layout.c:581:dht_layout_normalize] 0-HPC_data-dht: found anomalies in /. holes=1 overlaps=0
> [2012-03-15 04:45:07.353335] I [dht-selfheal.c:569:dht_selfheal_directory] 0-HPC_data-dht: 2 subvolumes down -- not fixing
> [2012-03-15 04:45:07.353365] I [client.c:1883:client_rpc_notify] 0-HPC_data-client-5: disconnected
> [2012-03-15 04:45:17.703676] I [client-handshake.c:1090:select_server_supported_programs] 0-HPC_data-client-4: Using Program GlusterFS 3.2.5, Num (1298437), Version (310)
> [2012-03-15 04:45:17.703857] I [client-handshake.c:913:client_setvolume_cbk] 0-HPC_data-client-4: Connected to 192.168.100.165:24009, attached to remote volume '/data'.
> [2012-03-15 04:45:17.706408] I [client-handshake.c:1090:select_server_supported_programs] 0-HPC_data-client-5: Using Program GlusterFS 3.2.5, Num (1298437), Version (310)
> [2012-03-15 04:45:17.706566] I [client-handshake.c:913:client_setvolume_cbk] 0-HPC_data-client-5: Connected to 192.168.100.166:24009, attached to remote volume '/data'.
> [2012-03-15 06:28:09.624927] I [dht-layout.c:581:dht_layout_normalize] 0-HPC_data-dht: found anomalies in /database/mongo/hipass_fixed/journal. holes=1 overlaps=0
> [2012-03-15 06:28:09.704031] I [dht-layout.c:581:dht_layout_normalize] 0-HPC_data-dht: found anomalies in /database/mongo/hipass_fixed/_tmp. holes=1 overlaps=0
> 
> We checked the Infiniband infrastucture and it is still working, hence we suppose that the problem should stay somewhere else.
> 
> Thanks a lot for your help,
> Alessio
> 
> 
> On 15/03/2012, at 12:32 , Mohit Anchlia wrote:
> 
>> Is this a new setup and used to work before? How is the CPU, memory etc? Also, what do you see in gluster nodes?
>> 
>> On Wed, Mar 14, 2012 at 7:33 PM, Alessio Checcucci <alessio.checcucci at gmail.com> wrote:
>> Dear All,
>> we are facing a problem in our computer room, we have 6 servers that act like bricks for GlusterFS, the servers are configured in the following way:
>> 
>> OS: Centos 6.2 x86_64
>> Kernel: 2.6.32-220.4.2.el6.x86_64
>> 
>> Gluster RPM packages:
>> glusterfs-core-3.2.5-2.el6.x86_64
>> glusterfs-rdma-3.2.5-2.el6.x86_64
>> glusterfs-geo-replication-3.2.5-2.el6.x86_64
>> glusterfs-fuse-3.2.5-2.el6.x86_64
>> 
>> Each one is contributing a XFS filesystem to the global volume, the transport mechanism is RDMA:
>> 
>> gluster volume create HPC_data transport rdma pleiades01:/data pleiades02:/data pleiades03:/data pleiades04:/data pleiades05:/data pleiades06:/data
>> 
>> Each server mounts, using the fuse driver, the volume on a dedicated mount point according to the following fstab:
>> 
>> pleiades01:/HPC_data        /HPCdata                glusterfs defaults,_netdev 0 0
>> 
>> We are running mongodb on top of the Gluster volume for performance testing and speed is definitely high. Unfortunately when we run a large mongoimport job after short time from the beginning the GlusterFS volume hangs completely and is inaccessible from any node. The following error is logged after some time in /var/log/messages:
>> 
>> Mar  8 08:16:03 pleiades03 kernel: INFO: task mongod:5508 blocked for more than 120 seconds.
>> Mar  8 08:16:03 pleiades03 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Mar  8 08:16:03 pleiades03 kernel: mongod        D 0000000000000007     0  5508      1 0x00000000
>> Mar  8 08:16:03 pleiades03 kernel: ffff881709b95de8 0000000000000086 0000000000000000 0000000000000008
>> Mar  8 08:16:03 pleiades03 kernel: ffff881709b95d68 ffffffff81090a7f ffff8816b6974cc0 0000000000000000
>> Mar  8 08:16:03 pleiades03 kernel: ffff8817fdd81af8 ffff881709b95fd8 000000000000f4e8 ffff8817fdd81af8
>> Mar  8 08:16:03 pleiades03 kernel: Call Trace:
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090a7f>] ? wake_up_bit+0x2f/0x40
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090d7e>] ? prepare_to_wait+0x4e/0x80
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112c6b5>] fuse_set_nowrite+0xa5/0xe0 [fuse]
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81090a90>] ? autoremove_wake_function+0x0/0x40
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112fd48>] fuse_fsync_common+0xa8/0x180 [fuse]
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffffa112fe30>] fuse_fsync+0x10/0x20 [fuse]
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff811a52d1>] vfs_fsync_range+0xa1/0xe0
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff811a537d>] vfs_fsync+0x1d/0x20
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff81144421>] sys_msync+0x151/0x1e0
>> Mar  8 08:16:03 pleiades03 kernel: [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b
>> 
>> Any attempt to access the volume from any node is fruitless until the mongodb process is killed, the sessions accessing the /HPCdata path gets freezed on any node. 
>> Anyway a complete stop (force) and start of the volume is needed to have it back operational.
>> The situation can be reproduced at will.
>> Is there anybody able to help us? Could we collect more pieces of information to help diagnosing the problem?
>> 
>> Thanks a lot
>> Alessio 
>> 
>> 
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users at gluster.org
>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users
>> 
>> 
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gluster.org/pipermail/gluster-users/attachments/20120316/aaa521c4/attachment.htm>