Re: GFS 6.0 lt_high_locks value in cluster.ccs

Jonathan Woytek <woytek+@xxxxxxx> · Mon, 09 Jan 2006 22:37:35 -0500

Chris Feist wrote:

Yes, issue #2 could definitely be the cause of your first issue. 
Unfortunately you'll need to bring down your cluster to change the value 
of lt_high_locks.  What is its value currently?  And how much memory do 
you have on your gulm lock servers?  You'll need about 256M of RAM for 
gulm for every 1 Million locks (plus enough for any other process and 
kernel).

On each of the gulm clients you can also cat /proc/gulm/lockspace to see 
which client is using most of the locks.

Thanks for the response!  I figured I would probably have to bring down 
the cluster to change the highwater setting, but I was hoping a bit that 
it could be changed dynamically.  Oh well.

The value is currently at the default, which I want to say is something 
like 1.04M.  These machines are both lock servers and samba/NFS servers, 
and have 4GB of RAM available (I have three lock servers in the cluster, 
and all three have 4GB of RAM).  A previous RedHat service call has me 
running the hugemem kernel on all three (the issue there was that, under 
just light activity loading, lowmem would be exhausted and the machines 
would enter an OOM spiral of death).  Now that I have turned off 
hyperthreading, though, memory usage seems to be dramatically lower than 
it was prior to that change.  For instance, the machine running samba 
services has been running since I turned off hyperthreading on Friday 
night.  Today, the machine was under some pretty heavy load.  On a 
normal day, prior to the hyperthreading change, I'd be down to maybe 
500MB of lowmem free right now (out of 3GB).  The only way to completely 
reclaim that memory would be to reboot.  So, now I'm sitting here 
looking at this machine, and it has 3.02GB of 3.31GB free.  I'm going to 
have to let this run for a while to determine if this is a red herring, 
but it looks much better than it ever has in the past.

Here's the interesting output from the /proc/gulm gadgets (note that, at 
the time I grabbed these, I was seeing the "more than the max" message 
logged to syslog between once and twice per minute, but not at the 
10-second rate that I read about previously):

[root@xxxxx root]# cat /proc/gulm/filesystems/data0
Filesystem: data0
JID: 0
handler_queue_cur: 0
handler_queue_max: 26584
[root@xxxxx root]# cat /proc/gulm/filesystems/data1
Filesystem: data1
JID: 0
handler_queue_cur: 0
handler_queue_max: 4583
[root@xxxxx root]# cat /proc/gulm/filesystems/data2
Filesystem: data2
JID: 0
handler_queue_cur: 0
handler_queue_max: 11738
[root@xxxxx root]# cat /proc/gulm/lockspace

lock counts:
  total: 41351
    unl: 29215
    exl: 3
    shd: 12055
    dfr: 0
pending: 0
   lvbs: 16758
   lops: 12597867

[root@xxxxx root]#

--

Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster