We are using GFS2 on 3 nodes cluster, kernel 2.6.18-164.6.1.el5, RHEL/CentOS5, x86_64 with 8-12GB memory in each node. The underlying storage is HP 2312fc smart array equipped with 12 SAS 15K rpm, configured as RAID10 using 10 HDDs + 2 spares. The array has about 4GB cache. Communication is 4Gbps FC, through HP StorageWorks 8/8 Base e-port SAN Switch.
Our application is apache version 1.3.41, mostly serving static HTML file + few PHP. Note that, we have to downgrade to 1.3.41 due to application requirement. Apache was configured with 500 MaxClients. Each HTML file is placed in different directory. The PHP script modify HTML file and do some locking prior to HTML modification. We use round-robin DNS to load balance between each web server.
The GFS2 storage was formatted with 4 journals, which is run over a LVM volume. We have configured CMAN, QDiskd, Fencing as appropriate and everything works just fine. We used QDiskd since the cluster initially only has 2 nodes. We used manual_fence temporarily since no fencing hardware was configured yet. GFS2 is mounted with noatime,nodiratime option.
Initially, the application was running fine. The problem we encountered is that, over time, load average on some nodes would gradually reach about 300-500, where in normal workload the machine should have about 10. When the load piled up, HTML modification will mostly fail.
We suspected that this might be plock_rate issue, so we modified cluster.conf configuration as well as adding some more mount options, such as num_glockd=16 and data="" to increase the performance. After we successfully reboot the system and mount the volume. We tried ping_pong (http://wiki.samba.org/index.php/Ping_pong) test to see how fast the lock can perform. The lock speed greatly increase from 100 to 3-5k/sec. However, after running ping_pong on all 3 nodes simultaneously, the ping_pong program hang with D state and we could not kill the process even with SIGKILL.
Due to the time constraint, we decided to leave the system as is, letting ping_pong stuck on all nodes while serving web request. After runing for hours, the httpd process got stuck in D state and couldn't be killed. All web serving was not possible at all. We have to reset all machine (unmount was not possible). The machines were back and GFS volume was back to normal.
Since we have to reset all machines, I decided to run gfs2_fsck on the volume. So I unmounted GFS2 on all nodes, run gfs2_fsck, answer "y" to many question about freeing block, and I got the volume back. However, the process stuck up occurred again very quickly. More seriously, trying to kill a running process in GFS or unmount it yield kernel panic and suspend the volume.
After this, the volume was never back to normal again. The volume will crash (kernel panic) almost immediately when we try to write something to it. This happened even if I removed mount option and just leave noatime and nodiratime. I didn't run gfs2_fsck again yet, since we decided to leave it as is and trying to backup as much data as possible.
Sorry for such a long story. In summary, my question is
- What could be the cause of load average pile up? Note that sometimes happened only on some nodes, although DNS round robin should fairly distribute workload to all nodes. At the least the load different shouldn't be that much.
- Should we run gfs2_fsck again? Why the lock up occur?
I have attached our cluster.conf as well as kernel panic log with this e-mail.
Thank you very much in advance
Best Regards,
===========================================
Somsak Sriprayoonsakul
INOX
Attachment:
cluster.conf
Description: Binary data
Apr 21 23:06:43 cafe2 kernel: dlm: data1: group leave failed -512 0 Apr 21 23:06:43 cafe2 dlm_controld[8075]: open "/sys/kernel/dlm/data1/event_done" error -1 2 Apr 21 23:06:43 cafe2 kernel: GFS2: fsid=pantip:data1.2: withdrawn Apr 21 23:06:43 cafe2 kernel: Apr 21 23:06:43 cafe2 kernel: Call Trace: Apr 21 23:06:43 cafe2 kernel: [<ffffffff8854c3ce>] :gfs2:gfs2_lm_withdraw+0xc1/0xd0 Apr 21 23:06:43 cafe2 kernel: [<ffffffff80017a2d>] cache_grow+0x35a/0x3c1 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8005c2b4>] cache_alloc_refill+0x106/0x186 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8854e242>] :gfs2:__glock_lo_add+0x62/0x89 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8855f58f>] :gfs2:gfs2_consist_rgrpd_i+0x34/0x39 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8855c08c>] :gfs2:rgblk_free+0x13a/0x15c Apr 21 23:06:43 cafe2 kernel: [<ffffffff8855cd83>] :gfs2:gfs2_free_data+0x27/0x9a Apr 21 23:06:43 cafe2 kernel: [<ffffffff88541985>] :gfs2:do_strip+0x2c9/0x349 Apr 21 23:06:43 cafe2 kernel: [<ffffffff885407e2>] :gfs2:recursive_scan+0xf2/0x175 Apr 21 23:06:43 cafe2 kernel: [<ffffffff885408fe>] :gfs2:trunc_dealloc+0x99/0xe7 Apr 21 23:06:43 cafe2 kernel: [<ffffffff885416bc>] :gfs2:do_strip+0x0/0x349 Apr 21 23:06:43 cafe2 kernel: [<ffffffff80090000>] sched_exit+0xb4/0xb5 Apr 21 23:06:43 cafe2 kernel: [<ffffffff88557dda>] :gfs2:gfs2_delete_inode+0xdd/0x191 Apr 21 23:06:43 cafe2 kernel: [<ffffffff88557d43>] :gfs2:gfs2_delete_inode+0x46/0x191 Apr 21 23:06:43 cafe2 kernel: [<ffffffff88547e77>] :gfs2:gfs2_glock_schedule_for_reclaim+0x5d/0x9a Apr 21 23:06:43 cafe2 kernel: [<ffffffff88557cfd>] :gfs2:gfs2_delete_inode+0x0/0x191 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8002f48f>] generic_delete_inode+0xc6/0x143 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8855c9a4>] :gfs2:gfs2_inplace_reserve_i+0x63b/0x691 Apr 21 23:06:43 cafe2 kernel: [<ffffffff88547dd8>] :gfs2:do_promote+0xf5/0x137 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8855124a>] :gfs2:gfs2_write_begin+0x16c/0x339 Apr 21 23:06:43 cafe2 kernel: [<ffffffff88552a83>] :gfs2:gfs2_file_buffered_write+0xf3/0x26c Apr 21 23:06:43 cafe2 kernel: [<ffffffff88552e54>] :gfs2:__gfs2_file_aio_write_nolock+0x258/0x28f Apr 21 23:06:43 cafe2 kernel: [<ffffffff88552ff6>] :gfs2:gfs2_file_write_nolock+0xaa/0x10f Apr 21 23:06:43 cafe2 kernel: [<ffffffff8009fc08>] autoremove_wake_function+0x0/0x2e Apr 21 23:06:43 cafe2 kernel: [<ffffffff8003f118>] vma_prio_tree_insert+0x20/0x38 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8001cbcb>] vma_link+0xd0/0xfd Apr 21 23:06:43 cafe2 kernel: [<ffffffff88553146>] :gfs2:gfs2_file_write+0x49/0xa7 Apr 21 23:06:43 cafe2 kernel: [<ffffffff8001691b>] vfs_write+0xce/0x174 Apr 21 23:06:43 cafe2 kernel: [<ffffffff800171d3>] sys_write+0x45/0x6e Apr 21 23:06:43 cafe2 kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0 Apr 21 23:06:43 cafe2 kernel: Apr 21 23:06:43 cafe2 kernel: GFS2: fsid=pantip:data1.2: gfs2_delete_inode: -5 Apr 21 23:06:43 cafe2 kernel: VFS:Filesystem freeze failed Apr 21 23:07:33 cafe2 shutdown[8848]: shutting down for system reboot
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster