R: High system CPU usage in one of a two node cluster

"Marco Lusini" <marco.lusini@xxxxxxxxxx> · Fri, 5 Jan 2007 11:49:46 +0100

Thanks Patrick,

I have tried to get the locks for Magma on both nodes,
and I get the same error of
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634:

cat: /proc/cluster/dlm_locks: Cannot allocate memory

I will try to install the RPMs from Lon if I can and
see if it solve the problem...

Marco 

> -----Messaggio originale-----
> Da: linux-cluster-bounces@xxxxxxxxxx 
> [mailto:linux-cluster-bounces@xxxxxxxxxx] Per conto di 
> Patrick Caulfield
> Inviato: venerdì 5 gennaio 2007 11.13
> A: linux clustering
> Oggetto: Re:  High system CPU usage in one of 
> a two node cluster
> 
> 
> Lon Hohberger wrote:
> > On Wed, 2007-01-03 at 12:35 +0100, Marco Lusini wrote:
> >> Hi all,
> >>  
> >> I have 3 2-node clusters, running just cluster suite, without gfs, 
> >> each one updated with the latest packages released by RHN.
> >>  
> >> In each cluster one of the two nodes has a steadily growing system 
> >> CPU usage, which seems to be consumed by clurgmgrd and dlm_recvd.
> >> As an example here is the running time accumulated on one cluster 
> >> since 20 december when oit was rebooted:
> >>  
> >> [root@estestest ~]# ps axo pid,start,time,args
> >>   PID  STARTED     TIME COMMAND
> >> ...
> >> 10221   Dec 20 10:37:05 clurgmgrd
> >> 11169   Dec 20 06:48:24 [dlm_recvd]
> >> ...
> >>  
> >> [root@frascati ~]# ps axo pid,start,time,args
> >>   PID  STARTED     TIME COMMAND
> >> ...
> >>  6226   Dec 20 00:04:17 clurgmgrd
> >>  8249   Dec 20 00:00:19 [dlm_recvd]
> >> ...
> 
> I suspect these two being at the top are related. If 
> clurgmgrd is taking out locks then dlm_recvd will also be busy
> 
> >> I attach two graphs made with RRD which show that the system CPU 
> >> usage is steadily growing:
> >> note how the trend changed after the reboot on 20 december.
> > 
> >> Of course as the system usage increases so does the system 
> load and I 
> >> am afraid of what will happen after 1-2 months of uptime...
> > 
> > System load averages are the average of the number of 
> processes on the 
> > run queue over the past 1, 5, and 15 minutes.  It doesn't generally 
> > trend upwards over time; if that were the case, I'd be in trouble:
> > 
> > ...
> > 28204 15:11:11 01:04:19 
> /usr/lib/firefox-1.5.0.9/firefox-bin -UILocale 
> > en-US ...
> > 
> > However, it is a little odd that you had 10 hours of runtime for 
> > clurgmgrd and over 6 for dlm_recvd.  Just taking a wild 
> guess, but it 
> > looks like the locks were all mastered on frascati.
> > 
> > How many services are you running?
> > 
> > Also, take a look at:
> > 
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634
> > 
> > The RPMs there might solve the problem with dlm_recvd.  
> Rgmanager in 
> > some situations causes a strange leak of NL locks in the DLM.  If 
> > dlm_recvd has to traverse lock lists and that list is ever-growing 
> > (total speculation here), it could explain the amount of consumed 
> > system time.
> > 
> 
> 
> Yes, DLM will do a lot of traversing lock lists if there are 
> a lot of similar locks on one resource. VMS has an 
> optimisation on this known as the group grant and concversion 
> grant modes that we don't currently implement.
> 
> 
> > How can I get more info on this? I checked 
> /proc/cluster/dlm_locks on 
> > both nodes and it is empty.
> 
> /proc/cluster/dlm_locks needs to be told which lockspace to 
> use. Just catting that file after bootup will show nothing.
> What you need to do is to echo the lockspace name into that 
> file, then look a it. You can get the lockspace names with 
> the "cman_tool services" command so (eg)
> 
> # cman_tool services
> 
> Service          Name                              GID LID 
> State     Code
> Fence Domain:    "default"                           1   2 run       -
> [1 2]
> 
> DLM Lock Space:  "clvmd"                             2   3 run       -
> [1 2]
> 
> # echo "clvmd" > /proc/cluster/dlm_locks # cat /proc/cluster/dlm_locks
> 
> This shows locks held by clvmd. If you want to look at 
> another lockspace just echo the other name into the /proc file.
> -- 
> 
> patrick
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster
> 
> _______________________________________________________
> Messaggio analizzato e protetto da tecnologia antivirus
> 
> Servizio erogato dal sistema informativo della Presidenza del 
> Consiglio dei Ministri

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster