Re: dlm high cpu on latest stock centos 5.1 kernel

"Andrew A. Neuschwander" <andrew@xxxxxxxxxxxx> · Tue, 1 Apr 2008 18:51:01 -0600 (MDT)

My symptoms are similar. dlm_send sits on all of the cpu. Top shows the
cpu spending nearly all of it's time in sys or interrupt handling. Disk
and network I/O isn't very high (as seen via iostat and iptraf). But
SMB/NFS throughput and latency are horrible. Context switches per second
as seen by vmstat are in the 20,000+ range (I don't now if this is high
though, I haven't really paid attention to this in the past). Nothing
crashes, and it is still able to serve data (very slowly), and eventually
the load and latency recovers.

As an aside, does anyone know how to _view_ the resource group size after
file system creation on GFS?

Thanks,
-Andrew

On Tue, April 1, 2008 6:30 pm, David Ayre wrote:
> What do you mean by pounded exactly ?
>
> We have an ongoing issue, similar... when we have about a dozen users
> using both smb/nfs, and at some seemingly random point in time our
> dlm_senddd chews up 100% of the CPU... then dies down at on its own
> after quite a while.  Killing SMB processes, shutting down SMB didn't
> seem to have any affect... only a reboot cures it.  I've seen this
> described (if this is the same issue) as a "soft lockup" as it does
> seem to come back to life:
>
> http://lkml.org/lkml/2007/10/4/137
>
> We've been assuming its a kernel/dlm version as we are running
> 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8
>
> we were going to try a kernel update this week... but you seem to be
> using a later version and still have this problem ?
>
> Could you elaborate on "getting pounded by dlm" ?  I've posted about
> this on this list in the past but received no assistance.
>
>
>
>
> On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote:
>
>> I have a GFS cluster with one node serving files via smb and nfs.
>> Under
>> fairly light usage (5-10 users) the cpu is getting pounded by dlm. I
>> am
>> using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This
>> sounds
>> like the dlm issue mentioned back in March of last year
>> (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html
>> )
>> that was resolved in 2.6.21.
>>
>> Has (or will) this fix be back ported to the current el5 kernel?
>> Will it
>> be in RHEL5.2? What is the easiest way for me to get this fix?
>>
>> Also, if I try a newer kernel on this node, will there be any harm
>> in the
>> other nodes using their current kernel?
>>
>> Thanks,
>> -Andrew
>> --
>> Andrew A. Neuschwander, RHCE
>> Linux Systems Administrator
>> Numerical Terradynamic Simulation Group
>> College of Forestry and Conservation
>> The University of Montana
>> http://www.ntsg.umt.edu
>> andrew@xxxxxxxxxxxx - 406.243.6310
>>
>> --
>> Linux-cluster mailing list
>> Linux-cluster@xxxxxxxxxx
>> https://www.redhat.com/mailman/listinfo/linux-cluster
>
> ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~
> David Ayre
> Programmer/Analyst - Information Technlogy Services
> Emily Carr Institute of Art and Design
> Vancouver, B.C.   Canada
> 604-844-3875 /  david@xxxxxxxx
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster