Thanks Patrick, I have tried to get the locks for Magma on both nodes, and I get the same error of https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634: cat: /proc/cluster/dlm_locks: Cannot allocate memory I will try to install the RPMs from Lon if I can and see if it solve the problem... Marco > -----Messaggio originale----- > Da: linux-cluster-bounces@xxxxxxxxxx > [mailto:linux-cluster-bounces@xxxxxxxxxx] Per conto di > Patrick Caulfield > Inviato: venerdì 5 gennaio 2007 11.13 > A: linux clustering > Oggetto: Re: High system CPU usage in one of > a two node cluster > > > Lon Hohberger wrote: > > On Wed, 2007-01-03 at 12:35 +0100, Marco Lusini wrote: > >> Hi all, > >> > >> I have 3 2-node clusters, running just cluster suite, without gfs, > >> each one updated with the latest packages released by RHN. > >> > >> In each cluster one of the two nodes has a steadily growing system > >> CPU usage, which seems to be consumed by clurgmgrd and dlm_recvd. > >> As an example here is the running time accumulated on one cluster > >> since 20 december when oit was rebooted: > >> > >> [root@estestest ~]# ps axo pid,start,time,args > >> PID STARTED TIME COMMAND > >> ... > >> 10221 Dec 20 10:37:05 clurgmgrd > >> 11169 Dec 20 06:48:24 [dlm_recvd] > >> ... > >> > >> [root@frascati ~]# ps axo pid,start,time,args > >> PID STARTED TIME COMMAND > >> ... > >> 6226 Dec 20 00:04:17 clurgmgrd > >> 8249 Dec 20 00:00:19 [dlm_recvd] > >> ... > > I suspect these two being at the top are related. If > clurgmgrd is taking out locks then dlm_recvd will also be busy > > >> I attach two graphs made with RRD which show that the system CPU > >> usage is steadily growing: > >> note how the trend changed after the reboot on 20 december. > > > >> Of course as the system usage increases so does the system > load and I > >> am afraid of what will happen after 1-2 months of uptime... > > > > System load averages are the average of the number of > processes on the > > run queue over the past 1, 5, and 15 minutes. It doesn't generally > > trend upwards over time; if that were the case, I'd be in trouble: > > > > ... > > 28204 15:11:11 01:04:19 > /usr/lib/firefox-1.5.0.9/firefox-bin -UILocale > > en-US ... > > > > However, it is a little odd that you had 10 hours of runtime for > > clurgmgrd and over 6 for dlm_recvd. Just taking a wild > guess, but it > > looks like the locks were all mastered on frascati. > > > > How many services are you running? > > > > Also, take a look at: > > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=212634 > > > > The RPMs there might solve the problem with dlm_recvd. > Rgmanager in > > some situations causes a strange leak of NL locks in the DLM. If > > dlm_recvd has to traverse lock lists and that list is ever-growing > > (total speculation here), it could explain the amount of consumed > > system time. > > > > > Yes, DLM will do a lot of traversing lock lists if there are > a lot of similar locks on one resource. VMS has an > optimisation on this known as the group grant and concversion > grant modes that we don't currently implement. > > > > How can I get more info on this? I checked > /proc/cluster/dlm_locks on > > both nodes and it is empty. > > /proc/cluster/dlm_locks needs to be told which lockspace to > use. Just catting that file after bootup will show nothing. > What you need to do is to echo the lockspace name into that > file, then look a it. You can get the lockspace names with > the "cman_tool services" command so (eg) > > # cman_tool services > > Service Name GID LID > State Code > Fence Domain: "default" 1 2 run - > [1 2] > > DLM Lock Space: "clvmd" 2 3 run - > [1 2] > > # echo "clvmd" > /proc/cluster/dlm_locks # cat /proc/cluster/dlm_locks > > This shows locks held by clvmd. If you want to look at > another lockspace just echo the other name into the /proc file. > -- > > patrick > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > _______________________________________________________ > Messaggio analizzato e protetto da tecnologia antivirus > > Servizio erogato dal sistema informativo della Presidenza del > Consiglio dei Ministri -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster