Paul,
Does the GFS lock traffic have its own interface
on each server and segmented switch network?
What does the perl script do?
Regards,
Britt Treece
From:
linux-cluster-bounces@xxxxxxxxxx [mailto:linux-cluster-bounces@xxxxxxxxxx] On Behalf Of Paul n McDowell
Sent: Wednesday, July 12, 2006
10:49 AM
To: linux-cluster@xxxxxxxxxx
Subject: GFS
performance
Hi,
Can
anyone provide techniques or suggestions to improve GFS performance?
The
problem we have is best summarized by the output of a perl script that one of
our frustrated developers has written:
The
script simply creates a file, then reads it, then removes it and prints out the
time it takes for each of these instructions to complete. Sometimes it
takes only a second to do all three, while sometimes it takes as long as 15
seconds to do these 3 simple instructions. I've run the script on all of
the five GFS file systems and see the same response characteristics. I've
run the script when the systems and when they are fairly quiet and still see
the same issue.
To
summarize the environment:
RH
ES3 update 6 (GFS 6.0) 11 node cluster environment with 3 redundant lock
managers and five GFS file systems mounted on each participating servers. The
GFS file systems range from 100GB to 1.5 TB.
The
storage array is an EMC CX700 attached to a dual redundant SAN consisting four
2GB Brocade 3900 SAN switches.
The
HBA's in all servers are Qlogic qla2340's Firmware version: 3.03.14,
Driver version 7.07.00
The
servers are all HP DL585 64bit AMD opteron server class machines each
configured with between 8GB and 32GB of memory.
.....................................................................................................................................................................................................................................................
I've
raised a support call with RedHat but according to their experts our
configuration seems already to be set for optimum performance.
RedHat
provide a utility to get and set tunable gfs file systems parameters but there
is next to no supporting documentation.
So,
is there anything I can do or am I missing something obvious that is just
plainly mis-configured?
Shown
below is the GFS configuration summary derived from lock_gulmd -C and gfs_tool
df for each file system.
I'll
be happy to supply any other information if it will help.
Thanks
to all in advance
Paul
McDowell
lock_gulmd
-C
#
hashed: 0x44164246
cluster
{
name
= "cra_gfs"
lock_gulm
{
heartbeat_rate = 15.000
allowed_misses = 3
coreport = 40040
new_connection_timeout = 15.000
# server cnt: 3
# servers = ["iclc1g.cra.applera.net",
"iclc2g.cra.applera.net", "ccf001g.cra.applera.net"]
servers = ["172.20.8.21", "172.20.8.22",
"172.20.8.51"]
lt_partitions = 4
lt_base_port = 41040
lt_high_locks = 20971520
lt_drop_req_rate = 300
prealloc_locks = 5000000
prealloc_holders = 11000000
prealloc_lkrqs = 60
ltpx_port = 40042
#gfs_tool
df
/crx:
SB lock proto = "lock_gulm"
SB lock table = "cra_gfs:cra_crx"
SB ondisk format = 1308
SB multihost format = 1401
Block size = 4096
Journals = 11
Resource Groups = 1988
Mounted lock proto = "lock_gulm"
Mounted lock table = "cra_gfs:cra_crx"
Mounted host data = "">
Journal number = 0
Lock module flags = async
Local flocks = FALSE
Local caching = FALSE
Type Total Used
Free use%
------------------------------------------------------------------------
inodes 933593 933593
0 100%
metadata 943899 121868
822031 13%
data 128274180 58546879
69727301 46%
[root@iclc1g
tmp]# gfs_tool df /crx/data
/crx/data:
SB lock proto = "lock_gulm"
SB lock table = "cra_gfs:cra_crxdata"
SB ondisk format = 1308
SB multihost format = 1401
Block size = 4096
Journals = 11
Resource Groups = 5970
Mounted lock proto = "lock_gulm"
Mounted lock table = "cra_gfs:cra_crxdata"
Mounted host data = "">
Journal number = 0
Lock module flags = async
Local flocks = FALSE
Local caching = FALSE
Type Total Used
Free use%
------------------------------------------------------------------------
inodes 3296091 3296091
0 100%
metadata 2649271 616186
2033085 23%
data 385236382 310495360
74741022 81%
[root@iclc1g
tmp]# gfs_tool df /crx/home
/crx/home:
SB lock proto = "lock_gulm"
SB lock table = "cra_gfs:cra_crxhome"
SB ondisk format = 1308
SB multihost format = 1401
Block size = 4096
Journals = 11
Resource Groups = 3978
Mounted lock proto = "lock_gulm"
Mounted lock table = "cra_gfs:cra_crxhome"
Mounted host data = "">
Journal number = 0
Lock module flags = async
Local flocks = FALSE
Local caching = FALSE
Type Total Used
Free use%
------------------------------------------------------------------------
inodes 3477487 3477487
0 100%
metadata 3162164 341627
2820537 11%
data 254032093 157709829
96322264 62%
[root@iclc1g
tmp]# gfs_tool df /usr/local
/usr/local:
SB lock proto = "lock_gulm"
SB lock table = "cra_gfs:cra_usrlocal"
SB ondisk format = 1308
SB multihost format = 1401
Block size = 4096
Journals = 11
Resource Groups = 394
Mounted lock proto = "lock_gulm"
Mounted lock table = "cra_gfs:cra_usrlocal"
Mounted host data = "">
Journal number = 0
Lock module flags = async
Local flocks = FALSE
Local caching = FALSE
Type Total Used
Free use%
------------------------------------------------------------------------
inodes 765762 765762
0 100%
metadata 582989 22854
560135 4%
data 24393837 9477084
14916753 39%
[root@iclc1g
tmp]# gfs_tool df /data
/data:
SB lock proto = "lock_gulm"
SB lock table = "cra_gfs:cra_GQ"
SB ondisk format = 1308
SB multihost format = 1401
Block size = 4096
Journals = 11
Resource Groups = 1298
Mounted lock proto = "lock_gulm"
Mounted lock table = "cra_gfs:cra_GQ"
Mounted host data = "">
Journal number = 0
Lock module flags = async
Local flocks = FALSE
Local caching = FALSE
Type Total Used
Free use%
------------------------------------------------------------------------
inodes 10026 10026
0 100%
metadata 282680 189037
93643 67%
data 103761726 94277221
9484505 91%
|
--
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster