Hi, On Thu, 2011-03-31 at 10:14 -0400, David Hill wrote: > These directories are all on the same mount ... with a total size of 1.2TB! > /mnt/gfs is the mount > /mnt/gfs/scripts/appl01 > /mnt/gfs/scripts/appl02 > /mnt/gfs/scripts/appl03 > /mnt/gfs/scripts/appl04 > /mnt/gfs/scripts/appl05 > /mnt/gfs/scripts/appl06 > /mnt/gfs/scripts/appl07 > /mnt/gfs/scripts/appl08 > > All files accessed by the application are within it's own folder/subdirectory. > No files is ever accessed by more than one node. > > I'm going to suggest to split but this also bring another issue: > > - We have a daily GFS lockout now... We need to reboot the whole cluster to solve the issue. > I'm not sure what you mean by that. What actually happens? Is it just the filesystem that goes slow? Do you get any messages in /var/log/messages do any nodes get fenced or does that fail too? Steve. > This is going bad. > > -----Original Message----- > From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Alan Brown > Sent: 31 mars 2011 07:21 > To: linux clustering > Subject: Re: GFS2 cluster node is running very slow > > David Hill wrote: > > Hi Steve, > > > > We seems to be experiencing some new issues now... With 4 nodes, only one is slow but with 3 nodes, 2 of them are now slow. > > 2 nodes are doing 20k/s and one is doing 2mb/s ... Seems like all nodes will end up with poor performances. > > All nodes are locking files in their own directory /mnt/application/tomcat-1, /mnt/application/tomcat-2 ... > > Just to clarify: > > Are these directories on the same filesystem or are they on individual > filesystems? > > If the former, try splitting into separate filesystems. > > Remember that one node will become the filesystem master and everything > else will be slower when accessing that filesystem. > > > I'm out of ideas on this one. > > > > Dave > > > > > > > > -----Original Message----- > > From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of David Hill > > Sent: 30 mars 2011 11:42 > > To: linux clustering > > Subject: Re: GFS2 cluster node is running very slow > > > > Hi Steve, > > > > I think you're right about the the glock ... There was MANY more of these. > > We're using a new server with totally different hardware. We've done many test > > before posting to the mailing list like: > > - copy files from the problematic node to the other nodes without using the problematic mount, everything is fine (7MB/s) > > - read from the problematic mount on the "broken" node is fine too (21MB/s) > > So, at this point, I doubt the problem is the network infrastructure behind the node (or the network adapter) because everything is going smooth on all aspect BUT > > we cannot use the /mnt on the broken node because it's not usable. Last time I tried to copy a file to that /mnt it was doing 5k/s while > > all the other nodes are doing ok at 7MB/s ... > > > > Whenever we do the test, it doesn't seem to go higher than 200k/s ... > > > > But still, we can transfer to all nodes at a decent speed from that host. > > We can transfer to the SAN at a decent speed. > > > > CPU is 0% used. > > Memory is 50% used. > > Network is 0% used. > > > > Only difference between that host and the others is that the mysql database is hosted locally and storage is on the same SAN ... but even with this, > > Mysqld is using only 2mbit/s on the loopback, a little bit of memory and mostly NO CPU . > > > > > > Here is a capture of the system: > > top - 15:39:51 up 7:40, 1 user, load average: 0.08, 0.13, 0.11 > > Tasks: 343 total, 1 running, 342 sleeping, 0 stopped, 0 zombie > > Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu1 : 0.1%us, 0.0%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu2 : 0.1%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu3 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu7 : 0.1%us, 0.0%sy, 0.0%ni, 99.8%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu8 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu9 : 0.1%us, 0.0%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu10 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu13 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu14 : 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu15 : 0.4%us, 0.1%sy, 0.0%ni, 99.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu16 : 0.1%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu17 : 0.4%us, 0.1%sy, 0.0%ni, 99.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu18 : 0.2%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu19 : 0.6%us, 0.1%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu20 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu21 : 0.6%us, 0.1%sy, 0.0%ni, 99.2%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st > > Cpu22 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > > Cpu23 : 0.1%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st > > Mem: 32952896k total, 2453956k used, 30498940k free, 256648k buffers > > Swap: 4095992k total, 0k used, 4095992k free, 684160k cached > > > > > > It's a monster for what it does. Could it be possible that it's soo much more performant than the other nodes that it kills itself? > > > > The servers is Centos 5.5 . > > The filesystem if 98% full (31G remaining on 1.2T) ... but if that is an issue, why does all other nodes running smoothly and having no issues but that one? > > > > > > Thank you for the reply, > > > > Dave > > > > > > > > -----Original Message----- > > From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Steven Whitehouse > > Sent: 30 mars 2011 07:48 > > To: linux clustering > > Subject: Re: GFS2 cluster node is running very slow > > > > Hi, > > > > On Wed, 2011-03-30 at 01:34 -0400, David Hill wrote: > >> Hi guys, > >> > >> > >> > >> Iâve found this in /sys/kernel/debug/gfs2/fsname/glocks > >> > >> > >> > >> H: s:EX f:tW e:0 p:22591 [jsvc] gfs2_inplace_reserve_i+0x451/0x69a > >> [gfs2] > >> > >> H: s:EX f:tW e:0 p:22591 [jsvc] gfs2_inplace_reserve_i+0x451/0x69a > >> [gfs2] > >> > >> H: s:EX f:W e:0 p:806 [pdflush] gfs2_write_inode+0x57/0x152 [gfs2] > >> > > This doesn't mean anything without a bit more context. Were these all > > queued against the same glock? If so which glock was it? > > > >> > >> > >> The application running is confluence and has 184 thread. The other > >> nodes work fine but that specific node is having issues obtaining > >> locks when itâs time to write? > >> > > That does sound a bit strange. Are you using a different network card on > > the slow node? Have you checked to see if there is too much traffic on > > that network link? > > > > Also, how full was the filesystem and which version of GFS2 are you > > using (i.e. RHELx, Fedora X or CentOS or....)? > > > > > > Steve. > > > >> > >> > >> Dave > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> From: linux-cluster-bounces@xxxxxxxxxx > >> [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of David Hill > >> Sent: 29 mars 2011 21:00 > >> To: linux-cluster@xxxxxxxxxx > >> Subject: GFS2 cluster node is running very slow > >> > >> > >> > >> > >> Hi guys, > >> > >> > >> > >> We have a GFS2 cluster consisting of 3 nodes. At this > >> point, everything is going smooth. Now, we add a new node with more > >> CPUs with the > >> > >> exact same configuration but all transactions on the mount run very > >> slow. > >> > >> > >> > >> Copying a file to the mount is done at about 25kb/s when on the three > >> other nodes, everything goes smooth at about 7MB/s. > >> > >> CPU on all nodes is idling at some point, all cluster process are kind > >> of sleeping. > >> > >> > >> > >> Weâve tried the ping_pong.c from apache and it seems to be able to > >> write/read lock files at a decent rate. > >> > >> > >> > >> Thereâs other mounts on the system using the same fc > >> card/fibers/switches/san and all these are also working at a decent > >> speed... > >> > >> > >> > >> Iâve been reading a good part of the day, and I canât seem to find a > >> solution. > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> ubisoft_logo > >> > >> David C. Hill > >> > >> Linux System Administrator - Enterprise > >> > >> 514-490-2000#5655 > >> > >> http://www.ubi.com > >> > >> > >> > >> > >> -- > >> Linux-cluster mailing list > >> Linux-cluster@xxxxxxxxxx > >> https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > -- > > Linux-cluster mailing list > > Linux-cluster@xxxxxxxxxx > > https://www.redhat.com/mailman/listinfo/linux-cluster > > > > > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster