Re: GFS tuning for combined batch / interactive use

Kevin Maguire <kmaguire@xxxxxxx> · Fri, 17 Dec 2010 20:06:58 +0100 (CET)

Hi

You can get a glock dump via debugfs which may show up contention, looks 
for type 2 glocks which have lots of lock requests queued but not 
granted. The lock requests (holders) are tagged with the relevant 
process.

Note I am currently using GFS, not GFS2. And before going further I ran 
the ping_pong test on my cluster and see only about 100 locks/second even 
on just 1 node.  So maybe I should look at plock_rate_limit parameter, 
though not sure if that is our core problem.

Anyways, As I write this my test cluster is being heavily used with batch 
jobs, and thus I have a window of opportunity to study it under load (but 
not change it).  I have debugfs mounted. There are 10 nodes in this test 
cluster. My filesystem is called mygfs, and was created via

mkfs.gfs -O -t dfoxen-cluster:mygfs -p lock_dlm -j 10 -r 2048 /dev/mapper/vggfs-lvgfs

This is what I have in debugfs:

# find /sys/kernel/debug/ -type f -exec wc -l {} \;
2309 /sys/kernel/debug/dlm/mygfs_locks
0 /sys/kernel/debug/dlm/mygfs_waiters
16258 /sys/kernel/debug/dlm/mygfs
2 /sys/kernel/debug/dlm/clvmd_locks
0 /sys/kernel/debug/dlm/clvmd_waiters
7 /sys/kernel/debug/dlm/clvmd

The lock dump file has content like:

# cat /sys/kernel/debug/dlm/mygfs_locks
id nodeid remid pid xid exflags flags sts grmode rqmode time_ms r_nodeid r_len r_name
14f19eb 0 0 1038 0 0 0 2 3 -1 0 0 24 "       5         cec3e6d"
3da1a67 0 0 31861 0 0 0 2 3 -1 0 0 24 "       5         a0fafc2"
1120003 1 16f0019 3552 0 408 0 2 0 -1 0 1 24 "       3        2d8b9091"
af0002 1 10024 3552 0 408 0 2 0 -1 0 1 24 "       3        2053fbf8"
...

But I don't really see how to work our which type of lock is which from 
this file - sorry. Given $2 is the nodeid I can work our who has locks and 
that leads to a minor strangeness

node1 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
   2142 0
   1619 2
   2001 3
   1586 4
   1566 5
   1624 6
   1610 7
   1733 8
   1592 9
   1612 10

These numbers are much bigger than the counts on the 9 other nodes, e.g.

node2 # awk 'NR>1{print $2}' /sys/kernel/debug/dlm/mygfs_locks | sort | uniq -c | sort -k +2n
    441 0
   1630 1
     75 3
      2 4
     10 5
     25 7
     15 8
     38 10

Is that normal ?

Using gfs_tool's lockdump I see

node1 # gfs_tool lockdump /newcache | egrep '^Glock' | sed 's?(\([0-9]*\).*)?\1?g' | sort | uniq -c
      3 Glock 1
    308 Glock 2
   1538 Glock 3
      2 Glock 4
    233 Glock 5
      2 Glock 8

Only type 2 and type 5 counts seem to change. Across the cluster there is 
one node with a lot more (10x more) Glock type 2 and Glock type 5 locks.

# gfs_tool counters /newcache

                                  locks 2313
                             locks held 781
                           freeze count 0
                          incore inodes 230
                       metadata buffers 1061
                        unlinked inodes 28
                              quota IDs 2
                     incore log buffers 28
                         log space used 1.46%
              meta header cache entries 1304
                     glock dependencies 185
                 glocks on reclaim list 0
                              log wraps 91
                   outstanding LM calls 0
                  outstanding BIO calls 0
                       fh2dentry misses 0
                       glocks reclaimed 2125924
                         glock nq calls 801437507
                         glock dq calls 796261692
                   glock prefetch calls 319835
                          lm_lock calls 6396763
                        lm_unlock calls 1031709
                           lm callbacks 7669741
                     address operations 1267096416
                      dentry operations 35815146
                      export operations 0
                        file operations 233333825
                       inode operations 61818196
                       super operations 148712313
                          vm operations 87114
                        block I/O reads 0
                       block I/O writes 0

Not sure if anyone can make anything from all these numbers ...

Thanks,
Kevin

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster