Hi, On Thu, 2010-04-22 at 02:29 +0700, Somsak Sriprayoonsakul wrote: > Just notice that, on a node it is using kernel version > 2.6.18-164.15.1.el5. Don't sure if the difference has any effect. > > On Thu, Apr 22, 2010 at 2:27 AM, Somsak Sriprayoonsakul > <somsaks@xxxxxxxxx> wrote: > Hello, > > We are using GFS2 on 3 nodes cluster, kernel > 2.6.18-164.6.1.el5, RHEL/CentOS5, x86_64 with 8-12GB memory in > each node. The underlying storage is HP 2312fc smart array > equipped with 12 SAS 15K rpm, configured as RAID10 using 10 > HDDs + 2 spares. The array has about 4GB cache. Communication > is 4Gbps FC, through HP StorageWorks 8/8 Base e-port SAN > Switch. > > Our application is apache version 1.3.41, mostly serving > static HTML file + few PHP. Note that, we have to downgrade to > 1.3.41 due to application requirement. Apache was configured > with 500 MaxClients. Each HTML file is placed in different > directory. The PHP script modify HTML file and do some locking > prior to HTML modification. We use round-robin DNS to load > balance between each web server. > Is the PHP script creating new html files (and therefore also new directories) or just modifying existing ones? Ideally you'd set up the system so that all accesses to a particular html file all go to the same node under normal circumstances and only fail over to a different node in the case of that particular node failing. That way you will ensure locality of access under normal conditions and thus get the maximum benefit from the cluster filesystem. >From your description I suspect that its the I/O pattern across nodes which is causing the main problem which you describe. I suspect that the DNS round robin is making the situation worse since it will be effectively randomly assigning requests to nodes. Having said that, killing processes using GFS2 or trying to umount it should not cause an oops. The kill maybe ignored for processes in 'D' (uninterruptible sleep) and likewise the umount may fail with -EBUSY, but any oops is a bug. Please report it via Red Hat's bugzilla. Using the num_glockd= command line parameter is not recommended with GFS2 (in fact it doesn't exist/is ignored in more recent versions) and setting data=writeback may or may not actually improve performance (it depends upon the individual workload) but it does increase the possibility of seeing corrupt data if there is a crash. I would generally caution against using data=writeback except in very special cases. Steve. -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster