File system slow & crash

Somsak Sriprayoonsakul <somsaks@xxxxxxxxx> · Thu, 22 Apr 2010 02:27:55 +0700

Hello,

We are using GFS2 on 3 nodes cluster, kernel 2.6.18-164.6.1.el5, RHEL/CentOS5, x86_64 with
8-12GB memory in each node. The underlying storage is HP 2312fc smart array equipped
with 12 SAS 15K rpm, configured as RAID10 using 10 HDDs + 2 spares. The
array has about 4GB cache. Communication is 4Gbps FC, through HP
StorageWorks 8/8 Base e-port SAN Switch.

Our application is apache version 1.3.41, mostly serving static
HTML file + few PHP. Note that, we have to downgrade to 1.3.41 due to
application requirement. Apache was configured with 500
MaxClients. Each HTML file is placed in different directory. The PHP script modify HTML file and do some locking prior to HTML modification. We use round-robin DNS to load balance between each web server.

The GFS2 storage was formatted with 4 journals, which is run over a LVM volume. We have configured CMAN, QDiskd, Fencing as appropriate and everything works just fine. We used QDiskd since the cluster initially only has 2 nodes. We used manual_fence temporarily since no fencing hardware was configured yet. GFS2 is mounted with noatime,nodiratime option.

Initially, the application was running fine. The problem we encountered is that, over time, load average on some nodes would gradually reach about 300-500, where in normal workload the machine should have about 10. When the load piled up, HTML modification will mostly fail.

We suspected that this might be plock_rate issue, so we modified cluster.conf configuration as well as adding some more mount options, such as num_glockd=16 and data="" to increase the performance. After we successfully reboot the system and mount the volume. We tried ping_pong (http://wiki.samba.org/index.php/Ping_pong) test to see how fast the lock can perform. The lock speed greatly increase from 100 to 3-5k/sec. However, after running ping_pong on all 3 nodes simultaneously, the ping_pong program hang with D state and we could not kill the process even with SIGKILL.

Due to the time constraint, we decided to leave the system as is, letting ping_pong stuck on all nodes while serving web request. After runing for hours, the httpd process got stuck in D state and couldn't be killed. All web serving was not possible at all. We have to reset all machine (unmount was not possible). The machines were back and GFS volume was back to normal. 

Since we have to reset all machines, I decided to run gfs2_fsck on the volume. So I unmounted GFS2 on all nodes, run gfs2_fsck, answer "y" to many question about freeing block, and I got the volume back. However, the process stuck up occurred again very quickly. More seriously, trying to kill a running process in GFS or unmount it yield kernel panic and suspend the volume. 

After this, the volume was never back to normal again. The volume will crash (kernel panic) almost immediately when we try to write something to it. This happened even if I removed mount option and just leave noatime and nodiratime. I didn't run gfs2_fsck again yet, since we decided to leave it as is and trying to backup as much data as possible.

Sorry for such a long story. In summary, my question is

What could be the cause of load average pile up? Note that sometimes happened only on some nodes, although DNS round robin should fairly distribute workload to all nodes. At the least the load different shouldn't be that much.

Should we run gfs2_fsck again? Why the lock up occur?

I have attached our cluster.conf as well as kernel panic log with this e-mail.

Thank you very much in advance

Best Regards,

===========================================
Somsak Sriprayoonsakul

INOX

Attachment:
cluster.conf

Description: Binary data
Apr 21 23:06:43 cafe2 kernel: dlm: data1: group leave failed -512 0
Apr 21 23:06:43 cafe2 dlm_controld[8075]: open "/sys/kernel/dlm/data1/event_done" error -1 2
Apr 21 23:06:43 cafe2 kernel: GFS2: fsid=pantip:data1.2: withdrawn
Apr 21 23:06:43 cafe2 kernel: 
Apr 21 23:06:43 cafe2 kernel: Call Trace:
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8854c3ce>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff80017a2d>] cache_grow+0x35a/0x3c1
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8005c2b4>] cache_alloc_refill+0x106/0x186
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8854e242>] :gfs2:__glock_lo_add+0x62/0x89
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8855f58f>] :gfs2:gfs2_consist_rgrpd_i+0x34/0x39
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8855c08c>] :gfs2:rgblk_free+0x13a/0x15c
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8855cd83>] :gfs2:gfs2_free_data+0x27/0x9a
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88541985>] :gfs2:do_strip+0x2c9/0x349
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff885407e2>] :gfs2:recursive_scan+0xf2/0x175
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff885408fe>] :gfs2:trunc_dealloc+0x99/0xe7
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff885416bc>] :gfs2:do_strip+0x0/0x349
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff80090000>] sched_exit+0xb4/0xb5
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88557dda>] :gfs2:gfs2_delete_inode+0xdd/0x191
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88557d43>] :gfs2:gfs2_delete_inode+0x46/0x191
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88547e77>] :gfs2:gfs2_glock_schedule_for_reclaim+0x5d/0x9a
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88557cfd>] :gfs2:gfs2_delete_inode+0x0/0x191
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8002f48f>] generic_delete_inode+0xc6/0x143
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8855c9a4>] :gfs2:gfs2_inplace_reserve_i+0x63b/0x691
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88547dd8>] :gfs2:do_promote+0xf5/0x137
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8855124a>] :gfs2:gfs2_write_begin+0x16c/0x339
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88552a83>] :gfs2:gfs2_file_buffered_write+0xf3/0x26c
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88552e54>] :gfs2:__gfs2_file_aio_write_nolock+0x258/0x28f
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88552ff6>] :gfs2:gfs2_file_write_nolock+0xaa/0x10f
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8009fc08>] autoremove_wake_function+0x0/0x2e
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8003f118>] vma_prio_tree_insert+0x20/0x38
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8001cbcb>] vma_link+0xd0/0xfd
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff88553146>] :gfs2:gfs2_file_write+0x49/0xa7
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8001691b>] vfs_write+0xce/0x174
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff800171d3>] sys_write+0x45/0x6e
Apr 21 23:06:43 cafe2 kernel:  [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Apr 21 23:06:43 cafe2 kernel: 
Apr 21 23:06:43 cafe2 kernel: GFS2: fsid=pantip:data1.2: gfs2_delete_inode: -5
Apr 21 23:06:43 cafe2 kernel: VFS:Filesystem freeze failed
Apr 21 23:07:33 cafe2 shutdown[8848]: shutting down for system reboot

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster