I'm out of ideas on this one.
Dave
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of David Hill
Sent: 30 mars 2011 11:42
To: linux clustering
Subject: Re: GFS2 cluster node is running very slow
Hi Steve,
I think you're right about the the glock ... There was MANY more of these.
We're using a new server with totally different hardware. We've done many test
before posting to the mailing list like:
- copy files from the problematic node to the other nodes without using the problematic mount, everything is fine (7MB/s)
- read from the problematic mount on the "broken" node is fine too (21MB/s)
So, at this point, I doubt the problem is the network infrastructure behind the node (or the network adapter) because everything is going smooth on all aspect BUT
we cannot use the /mnt on the broken node because it's not usable. Last time I tried to copy a file to that /mnt it was doing 5k/s while
all the other nodes are doing ok at 7MB/s ...
Whenever we do the test, it doesn't seem to go higher than 200k/s ...
But still, we can transfer to all nodes at a decent speed from that host.
We can transfer to the SAN at a decent speed.
CPU is 0% used.
Memory is 50% used.
Network is 0% used.
Only difference between that host and the others is that the mysql database is hosted locally and storage is on the same SAN ... but even with this,
Mysqld is using only 2mbit/s on the loopback, a little bit of memory and mostly NO CPU .
Here is a capture of the system:
top - 15:39:51 up 7:40, 1 user, load average: 0.08, 0.13, 0.11
Tasks: 343 total, 1 running, 342 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 0.1%us, 0.0%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 0.1%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 0.1%us, 0.0%sy, 0.0%ni, 99.8%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 0.1%us, 0.0%sy, 0.0%ni, 99.9%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 0.0%us, 0.0%sy, 0.0%ni, 99.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 0.1%us, 0.1%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 0.4%us, 0.1%sy, 0.0%ni, 99.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 0.1%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 0.4%us, 0.1%sy, 0.0%ni, 99.4%id, 0.1%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 0.2%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 0.6%us, 0.1%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 0.6%us, 0.1%sy, 0.0%ni, 99.2%id, 0.1%wa, 0.0%hi, 0.1%si, 0.0%st
Cpu22 : 0.2%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 0.1%us, 0.0%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 32952896k total, 2453956k used, 30498940k free, 256648k buffers
Swap: 4095992k total, 0k used, 4095992k free, 684160k cached
It's a monster for what it does. Could it be possible that it's soo much more performant than the other nodes that it kills itself?
The servers is Centos 5.5 .
The filesystem if 98% full (31G remaining on 1.2T) ... but if that is an issue, why does all other nodes running smoothly and having no issues but that one?
Thank you for the reply,
Dave
-----Original Message-----
From: linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Steven Whitehouse
Sent: 30 mars 2011 07:48
To: linux clustering
Subject: Re: GFS2 cluster node is running very slow
Hi,
On Wed, 2011-03-30 at 01:34 -0400, David Hill wrote:
Hi guys,
Iâve found this in /sys/kernel/debug/gfs2/fsname/glocks
H: s:EX f:tW e:0 p:22591 [jsvc] gfs2_inplace_reserve_i+0x451/0x69a
[gfs2]
H: s:EX f:tW e:0 p:22591 [jsvc] gfs2_inplace_reserve_i+0x451/0x69a
[gfs2]
H: s:EX f:W e:0 p:806 [pdflush] gfs2_write_inode+0x57/0x152 [gfs2]
This doesn't mean anything without a bit more context. Were these all
queued against the same glock? If so which glock was it?
The application running is confluence and has 184 thread. The other
nodes work fine but that specific node is having issues obtaining
locks when itâs time to write?
That does sound a bit strange. Are you using a different network card on
the slow node? Have you checked to see if there is too much traffic on
that network link?
Also, how full was the filesystem and which version of GFS2 are you
using (i.e. RHELx, Fedora X or CentOS or....)?
Steve.
Dave
From: linux-cluster-bounces@xxxxxxxxxx
[mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of David Hill
Sent: 29 mars 2011 21:00
To: linux-cluster@xxxxxxxxxx
Subject: GFS2 cluster node is running very slow
Hi guys,
We have a GFS2 cluster consisting of 3 nodes. At this
point, everything is going smooth. Now, we add a new node with more
CPUs with the
exact same configuration but all transactions on the mount run very
slow.
Copying a file to the mount is done at about 25kb/s when on the three
other nodes, everything goes smooth at about 7MB/s.
CPU on all nodes is idling at some point, all cluster process are kind
of sleeping.
Weâve tried the ping_pong.c from apache and it seems to be able to
write/read lock files at a decent rate.
Thereâs other mounts on the system using the same fc
card/fibers/switches/san and all these are also working at a decent
speed...
Iâve been reading a good part of the day, and I canât seem to find a
solution.
ubisoft_logo
David C. Hill
Linux System Administrator - Enterprise
514-490-2000#5655
http://www.ubi.com
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster
--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster