Dave, Thanks for the update. I had considered that and I'm setup to be able to do it. Now that someone else has tried with positive results, I think I'll give it a try. Thanks, -Andrew On Thu, April 3, 2008 3:29 pm, David Ayre wrote: > Some progress... > > We had another dlm_sendd lockup yesterday which prompted us to do some > reworking of our file sharing. Previously we had both SMB and NFS > services competing for GFS resources on this particular node. We > thought perhaps it was this combination which may have provoked the > lockups... so, we moved things around with the help of another server > in our GFS cluster. > > Previously we had: > > Machine A (nfs and smb services sitting on top of gfs) > NFS SMB > GFS > > And switched things around to this: > > Machine A > SMB > NFS -> Machine B > > Machine B > NFS > GFS > > Basically we moved all NFS mounts to machine B.... NFS is the only > file sharing service using GFS on this machine, and changed Machine A > to use an NFS mount to machine B. This way we don't have any nodes > with both SMB and NFS services running on top of GFS. > > Previously we had 1-2 lockups a day, but today nothing... so far so > good. Not sure if this configuration will work for you... let me > know if you need any further clarification. > > d > > > On 1-Apr-08, at 5:51 PM, Andrew A. Neuschwander wrote: > >> My symptoms are similar. dlm_send sits on all of the cpu. Top shows >> the >> cpu spending nearly all of it's time in sys or interrupt handling. >> Disk >> and network I/O isn't very high (as seen via iostat and iptraf). But >> SMB/NFS throughput and latency are horrible. Context switches per >> second >> as seen by vmstat are in the 20,000+ range (I don't now if this is >> high >> though, I haven't really paid attention to this in the past). Nothing >> crashes, and it is still able to serve data (very slowly), and >> eventually >> the load and latency recovers. >> >> As an aside, does anyone know how to _view_ the resource group size >> after >> file system creation on GFS? >> >> Thanks, >> -Andrew >> >> >> On Tue, April 1, 2008 6:30 pm, David Ayre wrote: >>> What do you mean by pounded exactly ? >>> >>> We have an ongoing issue, similar... when we have about a dozen users >>> using both smb/nfs, and at some seemingly random point in time our >>> dlm_senddd chews up 100% of the CPU... then dies down at on its own >>> after quite a while. Killing SMB processes, shutting down SMB didn't >>> seem to have any affect... only a reboot cures it. I've seen this >>> described (if this is the same issue) as a "soft lockup" as it does >>> seem to come back to life: >>> >>> http://lkml.org/lkml/2007/10/4/137 >>> >>> We've been assuming its a kernel/dlm version as we are running >>> 2.6.9-55.0.6.ELsmp with dlm-kernel 2.6.9-46.16.0.8 >>> >>> we were going to try a kernel update this week... but you seem to be >>> using a later version and still have this problem ? >>> >>> Could you elaborate on "getting pounded by dlm" ? I've posted about >>> this on this list in the past but received no assistance. >>> >>> >>> >>> >>> On 1-Apr-08, at 5:19 PM, Andrew A. Neuschwander wrote: >>> >>>> I have a GFS cluster with one node serving files via smb and nfs. >>>> Under >>>> fairly light usage (5-10 users) the cpu is getting pounded by dlm. I >>>> am >>>> using CentOS5.1 with the included kernel (2.6.18-53.1.14.el5). This >>>> sounds >>>> like the dlm issue mentioned back in March of last year >>>> (https://www.redhat.com/archives/linux-cluster/2007-March/msg00068.html >>>> ) >>>> that was resolved in 2.6.21. >>>> >>>> Has (or will) this fix be back ported to the current el5 kernel? >>>> Will it >>>> be in RHEL5.2? What is the easiest way for me to get this fix? >>>> >>>> Also, if I try a newer kernel on this node, will there be any harm >>>> in the >>>> other nodes using their current kernel? >>>> >>>> Thanks, >>>> -Andrew >>>> -- >>>> Andrew A. Neuschwander, RHCE >>>> Linux Systems Administrator >>>> Numerical Terradynamic Simulation Group >>>> College of Forestry and Conservation >>>> The University of Montana >>>> http://www.ntsg.umt.edu >>>> andrew@xxxxxxxxxxxx - 406.243.6310 >>>> >>>> -- >>>> Linux-cluster mailing list >>>> Linux-cluster@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/linux-cluster >>> >>> ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ >>> David Ayre >>> Programmer/Analyst - Information Technlogy Services >>> Emily Carr Institute of Art and Design >>> Vancouver, B.C. Canada >>> 604-844-3875 / david@xxxxxxxx >>> >>> -- >>> Linux-cluster mailing list >>> Linux-cluster@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/linux-cluster >> >> -- >> Linux-cluster mailing list >> Linux-cluster@xxxxxxxxxx >> https://www.redhat.com/mailman/listinfo/linux-cluster > > ~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~_~ > David Ayre > Programmer/Analyst - Information Technlogy Services > Emily Carr Institute of Art and Design > Vancouver, B.C. Canada > 604-844-3875 / david@xxxxxxxx > > -- > Linux-cluster mailing list > Linux-cluster@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/linux-cluster > > -- Andrew A. Neuschwander, RHCE Linux Systems Administrator Numerical Terradynamic Simulation Group College of Forestry and Conservation The University of Montana http://www.ntsg.umt.edu andrew@xxxxxxxxxxxx - 406.243.6310 -- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster