Re: GFS cluster / DLM locking - Mostly idle but high load

Marc Grimme <grimme@xxxxxxx> · Wed, 17 Oct 2007 10:16:30 +0200



On Wednesday 17 October 2007 09:52:20 Gordan Bobic wrote:
> On Wed, 17 Oct 2007, Marc Grimme wrote:
> >>>> I have a cluster (3 nodes at the moment, may grow up to 16) for
> >>>> handling a lot of small files (Maildir). When I test the system by
> >>>> sending around 3-5 messages/second I see the load on the cluster nodes
> >>>> go up to about 20-30, even though the CPUs on the cluster are about
> >>>> 90% idle at all times.
> >>>>
> >>>> I am guessing that this is due to the clustered machines waiting for
> >>>> DLM locks to be established, which causes a lot of processes to be
> >>>> fighting to run, but since they don't get to run very soon, they back
> >>>> up and cause the load averages to go up.
> >>>>
> >>>> Assuming the DLM runs over the interface specified by IP and MAC in
> >>>> cluster.conf, it is running over gigabit ethernet.
> >>>>
> >>>> Are there any configuration changes or tuning parameters I can apply
> >>>> to DLM to alleviate this condition? The machine I'm running the test
> >>>> from (the one sending messages) is about 1/4 of the spec of each of
> >>>> the cluster nodes, and it's running a load average of about 0.4. It
> >>>> seems crazy that a single low-spec node should be able to completely
> >>>> overwhelm a cluster 12x it's spec several times over.
> >>>
> >>> I don't know alot about GFS but since no one else has replied yet, my
> >>> understanding is that it's not suitable for an applications like what
> >>> you describe (many small files being opened frequently). I think GFS2,
> >>> which is still a tech preview, has been redesigned to improve this
> >>> situation.
> >>
> >> Indeed, I am aware that GFS2 is still broken, but I seem to be getting
> >> no worse a performance out of GFS than I get out of NFS. The only
> >> penalty is the high load, but the throughput is actually similar. The
> >> advantage that makes GFS win is that I don't need an arbitrating server
> >> to handle the NFS exports, which makes the clustering and redundancy a
> >> bit tidier.
> >
> > with your testing did you also try to adapt the size of the
> > rsbtbl_size/lkbtbl_size? I would be quite interested if this increases
> > your performance or not.
>
> I cannot find these files in /proc (that's where they are implied to be in
> the docs). Can you please point me in the right direction?
Sorry I new I forgot something ;-)
http://www.opensharedroot.org/Members/marc/blog/blog-on-dlm/red-hat-dlm-__find_lock_by_id/influence-of-locktable-sizes-rsbtbl_size-lkbtbl_size
>
> > Do you have lot of small files?
>
> Yes. The problem doesn't seem to be so bad when files are in different
> directories, but when lots of files are being written to the same
> directory, the load goes up quite badly.
Then this should help. Also enable lock_purging if not already done.
>
> Gordan
Marc.
>
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/linux-cluster


-- 
Gruss / Regards,

Marc Grimme
http://www.atix.de/               http://www.open-sharedroot.org/

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster