GFS performance

Paul n McDowell <Paul.McDowell@xxxxxxxxxx> · Wed, 12 Jul 2006 11:48:54 -0400

Hi,

Can anyone provide techniques or suggestions
to improve GFS performance?

The problem we have is best summarized
by the output of a perl script that one of our frustrated developers has
written:

The script simply creates a file, then
reads it, then removes it and prints out the time it takes for each of
these instructions to complete.  Sometimes it takes only a second
to do all three, while sometimes it takes as long as 15 seconds to do these
3 simple instructions.  I've run the script on all of the five GFS
file systems and see the same response characteristics.  I've run
the script when the systems and when they are fairly quiet and still see
the same issue.

To summarize the environment:

RH ES3 update 6 (GFS 6.0) 11 node cluster
environment with 3 redundant lock managers and five GFS file systems mounted
on each participating servers.  The GFS file systems range from 100GB
to 1.5 TB.

The storage array is an EMC CX700 attached
to a dual redundant SAN consisting four 2GB Brocade 3900 SAN switches.

The HBA's in all servers are Qlogic
qla2340's  Firmware version:  3.03.14, Driver version 7.07.00

The servers are all HP DL585 64bit AMD
opteron server class machines each configured with between 8GB and 32GB
of memory.

.....................................................................................................................................................................................................................................................

I've raised a support call with RedHat
but according to their experts our configuration seems already to be set
for optimum performance.

RedHat provide a utility to get and
set tunable gfs file systems parameters but there is next to no supporting
documentation. 

So, is there anything I can do or am
I missing something obvious that is just plainly mis-configured? 

Shown below is the GFS configuration
summary derived from lock_gulmd -C and gfs_tool df for each file system.

I'll be happy to supply any other information
if it will help.

Thanks to all in advance

Paul McDowell

lock_gulmd -C

# hashed: 0x44164246

cluster {

 name = "cra_gfs"

 lock_gulm {

  heartbeat_rate = 15.000

  allowed_misses = 3

  coreport = 40040

  new_connection_timeout = 15.000

  # server cnt: 3

  # servers = ["iclc1g.cra.applera.net",
"iclc2g.cra.applera.net", "ccf001g.cra.applera.net"]

  servers = ["172.20.8.21",
"172.20.8.22", "172.20.8.51"]

  lt_partitions = 4

  lt_base_port = 41040

  lt_high_locks = 20971520

  lt_drop_req_rate = 300

  prealloc_locks = 5000000

  prealloc_holders = 11000000

  prealloc_lkrqs = 60

  ltpx_port = 40042

#gfs_tool df

/crx:

  SB lock proto = "lock_gulm"

  SB lock table = "cra_gfs:cra_crx"

  SB ondisk format = 1308

  SB multihost format = 1401

  Block size = 4096

  Journals = 11

  Resource Groups = 1988

  Mounted lock proto = "lock_gulm"

  Mounted lock table = "cra_gfs:cra_crx"

  Mounted host data = "">

  Journal number = 0

  Lock module flags = async 

  Local flocks = FALSE

  Local caching = FALSE

  Type        
  Total          Used      
    Free           use%    

  ------------------------------------------------------------------------

  inodes        
933593         933593         0
             100%

  metadata      
943899         121868         822031
        13%

  data        
  128274180      58546879       69727301
      46%

[root@iclc1g tmp]# gfs_tool df /crx/data

/crx/data:

  SB lock proto = "lock_gulm"

  SB lock table = "cra_gfs:cra_crxdata"

  SB ondisk format = 1308

  SB multihost format = 1401

  Block size = 4096

  Journals = 11

  Resource Groups = 5970

  Mounted lock proto = "lock_gulm"

  Mounted lock table = "cra_gfs:cra_crxdata"

  Mounted host data = "">

  Journal number = 0

  Lock module flags = async 

  Local flocks = FALSE

  Local caching = FALSE

  Type        
  Total          Used      
    Free           use%    

  ------------------------------------------------------------------------

  inodes        
3296091        3296091        0
             100%

  metadata      
2649271        616186         2033085
       23%

  data        
  385236382      310495360      74741022
      81%

[root@iclc1g tmp]# gfs_tool df /crx/home

/crx/home:

  SB lock proto = "lock_gulm"

  SB lock table = "cra_gfs:cra_crxhome"

  SB ondisk format = 1308

  SB multihost format = 1401

  Block size = 4096

  Journals = 11

  Resource Groups = 3978

  Mounted lock proto = "lock_gulm"

  Mounted lock table = "cra_gfs:cra_crxhome"

  Mounted host data = "">

  Journal number = 0

  Lock module flags = async 

  Local flocks = FALSE

  Local caching = FALSE

  Type        
  Total          Used      
    Free           use%    

  ------------------------------------------------------------------------

  inodes        
3477487        3477487        0
             100%

  metadata      
3162164        341627         2820537
       11%

  data        
  254032093      157709829      96322264
      62%

[root@iclc1g tmp]# gfs_tool df /usr/local

/usr/local:

  SB lock proto = "lock_gulm"

  SB lock table = "cra_gfs:cra_usrlocal"

  SB ondisk format = 1308

  SB multihost format = 1401

  Block size = 4096

  Journals = 11

  Resource Groups = 394

  Mounted lock proto = "lock_gulm"

  Mounted lock table = "cra_gfs:cra_usrlocal"

  Mounted host data = "">

  Journal number = 0

  Lock module flags = async 

  Local flocks = FALSE

  Local caching = FALSE

  Type        
  Total          Used      
    Free           use%    

  ------------------------------------------------------------------------

  inodes        
765762         765762         0
             100%

  metadata      
582989         22854          560135
        4%

  data        
  24393837       9477084        14916753
      39%

[root@iclc1g tmp]# gfs_tool df /data

/data:

  SB lock proto = "lock_gulm"

  SB lock table = "cra_gfs:cra_GQ"

  SB ondisk format = 1308

  SB multihost format = 1401

  Block size = 4096

  Journals = 11

  Resource Groups = 1298

  Mounted lock proto = "lock_gulm"

  Mounted lock table = "cra_gfs:cra_GQ"

  Mounted host data = "">

  Journal number = 0

  Lock module flags = async 

  Local flocks = FALSE

  Local caching = FALSE

  Type        
  Total          Used      
    Free           use%    

  ------------------------------------------------------------------------

  inodes        
10026          10026        
 0              100%

  metadata      
282680         189037         93643
         67%

  data        
  103761726      94277221       9484505
       91%

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster