Re: Re: Why GFS is so slow? What it is waiting for?

Ja S <jas199931@xxxxxxxxx> · Fri, 9 May 2008 05:59:27 -0700 (PDT)

Hi, Martin:

Another big thanks to you for your kind reply and
suggestions.

Best,

Jas

--- Martin Fuerstenau <martin.fuerstenau@xxxxxxx>
wrote:

> Hi,
> 
> unfortunaley not. According to my informaiotns
> (which are mainly from
> this list and from the wiki) for each node of the
> cluster this structure
> (journal) is established on the filesystem. If you
> read the manpage for
> gfs_fsck you see, that it must be unmounted from all
> nodes.
> 
> If you have the problem I had you should plan a
> maintenance window
> asap. 
> 
> My problem started as mentioned with a slow gfs from
> the beginning and
> lead to clustercrashs after 7 months. All my
> problems were fixed by the
> check. Perhaps is the same with your system.
> 
> Yours - Martin
> 
> On Fri, 2008-05-09 at 04:51 -0700, Ja S wrote:
> > Hi Martin:
> > 
> > Thanks for your reply indeed.
> > 
> > --- Martin Fuerstenau <martin.fuerstenau@xxxxxxx>
> > wrote:
> > 
> > > Hi,
> > > 
> > > I had (nearly) the same problem. A slow gfs.
> From
> > > the beginning. Two
> > > weeks ago the cluster crashed every time the
> load
> > > became heavier.
> > > 
> > > What was the reason? A rotten gfs. The gfs uses
> > > leafnodes for data an
> > > leafnodes for metadata whithin the filesystem.
> And
> > > the problem was in
> > > the metadata leafnodes.
> > > 
> > > Have you checked the Filesystem? Unmount it from
> all
> > > nodes and use
> > > gfs_fsck on the filesystem. 
> > 
> > No, not yet. I am afraid I cannot umount the file
> > sytem then do the gfs_fsck since the server
> downtime
> > is totally forbidden. 
> > 
> > Is there any other way to reclaim the unused or
> lost
> > blocks ( I guess  leafnodes you mentioned meant to
> be
> > the disk block, correct me if I am wrong.)? 
> > 
> > Should "gfs_tool settune /mnt/points inoded_secs
> 10"
> > work for a heavy loaded node with freqent create
> and
> > delete file operations?
> > 
> > 
> > >In my case it reported
> > > (and repaired) tons
> > > of unused leafnoedes and some other errors.
> First
> > > time I started it
> > > without the -y (for yes). Well, after one hour
> ot
> > > typing y I killed it
> > > and started it with -y. The work was done
> whithin an
> > > hour for 1TB. Now
> > > the filesystem is clean and it was like a
> > > turboloader and Nitrogen
> > > injection for a car. Fast as it was never
> before. 
> > 
> > Great. Sounds fantastic. However, if the low
> > performance is caused by the "rotten" gfs, will
> your
> > now cleaned file system be possibly messed up
> again
> > after a certain period? Do you have a smart way to
> > monitor the status of your file system in order to
> > make a regular downtime schedule and "force" your
> > manager to prove it, :-) ? If you do, I am eager
> to
> > know.
> > 
> > Thanks again and look forward to your next reply.
> > 
> > Best,
> > 
> > Jas
> > 
> > 
> > 
> > 
> > > Maybe there is a bug in the mkfs command or so.
> I
> > > will never use a gfs
> > > without a filesystem check after creation
> > > 
> > > Martin Fuerstenau
> > > Seniro System Engineer
> > > Oce Printing Systems, Poing
> > > 
> > > On Fri, 2008-05-09 at 02:25 -0700, Ja S wrote:
> > > > Hi, Klaus:
> > > > 
> > > > Thank you very much for your kind answer.
> > > > 
> > > > Tunning the parameters sounds really
> interesting.
> > > I
> > > > should give it a try.
> > > > 
> > > > By the way, how did you come up with these new
> > > > parameter values? Did you calculate them based
> on 
> > > > some measures or simply pick them up and test.
> > > > 
> > > > Best,
> > > > 
> > > > Jas
> > > > 
> > > > 
> > > > --- Klaus Steinberger
> > > > <Klaus.Steinberger@xxxxxxxxxxxxxxxxxxxxxx>
> wrote:
> > > > 
> > > > > Hi,
> > > > > 
> > > > > > However, it took ages to list the
> subdirectory
> > > on
> > > > > an
> > > > > > absolute idle cluster node. See below:
> > > > > >
> > > > > > # time ls -la | wc -l
> > > > > > 31767
> > > > > >
> > > > > > real    3m5.249s
> > > > > > user    0m0.628s
> > > > > > sys     0m5.137s
> > > > > >
> > > > > > There are about 3 minutes spent on
> somewhere.
> > > Does
> > > > > > anyone have any clue what the system was
> > > waiting
> > > > > for?
> > > > > 
> > > > > Did you tune glock's?  I found that it's
> very
> > > > > important for performance of 
> > > > > GFS.
> > > > > 
> > > > > I'm doing the following tunings currently:
> > > > > 
> > > > > gfs_tool settune /export/data/etp
> quota_account
> > > 0
> > > > > gfs_tool settune /export/data/etp
> glock_purge 50
> > > > > gfs_tool settune /export/data/etp
> demote_secs
> > > 200
> > > > > gfs_tool settune /export/data/etp
> statfs_fast 1
> > > > > 
> > > > > Switch off quota off course only if you
> don't
> > > need
> > > > > it. All this tunings have 
> > > > > to be done every time after mounting, so do
> it
> > > in a
> > > > > init.d script running 
> > > > > after GFS mount, and of course do it on
> every
> > > node.
> > > > > 
> > > > > Here is the link to the glock paper:
> > > > > 
> > > > >
> > > >
> > >
> >
>
http://people.redhat.com/wcheng/Patches/GFS/readme.gfs_glock_trimming.R4
> > > > > 
> > > > > The glock tuning (glock_purge and
> demote_secs
> > > > > parameters) definitly solved  a 
> > > > > problem we had here with the Tivoli Backup
> > > Client.
> > > > > Before it was running for 
> > > > > days and sometimes even did give up. We
> observed
> > > > > heavy lock traffic.
> > > > > 
> > > > > After changing the glock parameters times
> for
> > > the
> > > > > backup did go down 
> > > > > dramatically, we now can run a Incremental
> > > Backup on
> > > > > a 4 TByte filesystem in 
> > > > > under 4 hours. So give it a try.
> > > > > 
> > > > > There is some more tuning, which could be
> done
> > > > > unfortunately just on creation 
> > > > > of filesystem. The default number of
> Resource
> > > Groups
> > > > > is ways too large for 
> > > > > nowadays TByte Filesystems. 
> > > > > 
> > > > > Sincerly,
> > > > > Klaus
> > > > > 
> > > > > 
> > > > > -- 
> > > > > Klaus Steinberger        
> > > Beschleunigerlaboratorium
> > > > > Phone: (+49 89)289 14287  Am Coulombwall 6,
> > > D-85748
> > > > > Garching, Germany
> > > > > FAX:   (+49 89)289 14280  EMail:
> > > > > Klaus.Steinberger@xxxxxxxxxxxxxxxxxxxxxx
> > > > > URL:
> > > > >
> > > >
> > >
> >
>
http://www.physik.uni-muenchen.de/~Klaus.Steinberger/
> > > > > > --
> > > > > Linux-cluster mailing list
> > > > > Linux-cluster@xxxxxxxxxx
> > > > >
> > > >
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > > 
> > > > 
> > > > 
> > > >      
> > >
> >
>
____________________________________________________________________________________
> > > > Be a better friend, newshound, and 
> > > > know-it-all with Yahoo! Mobile.  Try it now. 
> > >
> >
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > > > 
> > > > --
> > > > Linux-cluster mailing list
> > > > Linux-cluster@xxxxxxxxxx
> > > >
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > > 
> > > Martin Fürstenau        Tel.    : (49)
> 8121-72-4684
> > > Oce Printing Systems  Fax     : (49)
> 8121-72-4996
> > > OI-12                        E-Mail  :
> > > martin.fuerstenau@xxxxxxx
> > > Siemensallee 2
> > > 85586 Poing
> > > Germany
> > > 
> > > 
> > > 
> > > Visit Oce at drupa! Register online now:
> > > <http://drupa.oce.com>
> > > 
> > > This message and attachment(s) are intended
> solely
> > > for use by the addressee and may contain
> information
> > > that is privileged, confidential or otherwise
> exempt
> > > from disclosure under applicable law.
> > > 
> > > If you are not the intended recipient or agent
> > > thereof responsible for delivering this message
> to
> > > the intended recipient, you are hereby notified
> that
> > > any dissemination, distribution or copying of
> this
> > > communication is strictly prohibited.
> > > 
> > > If you have received this communication in
> error,
> > > please notify the sender immediately by
> telephone
> > > and with a 'reply' message.
> > > 
> > > Thank you for your co-operation.
> > > 
> > > 
> > > 
> > > --
> > > Linux-cluster mailing list
> > > Linux-cluster@xxxxxxxxxx
> > >
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > > 
> > 
> > 
> > 
> >      
>
____________________________________________________________________________________
> > Be a better friend, newshound, and 
> > know-it-all with Yahoo! Mobile.  Try it now. 
>
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
> > 
> > --
> > Linux-cluster mailing list
> > Linux-cluster@xxxxxxxxxx
> >
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> > 
> 
> Visit Oce at drupa! Register online now:
> <http://drupa.oce.com>
> 
> This message and attachment(s) are intended solely
> for use by the addressee and may contain information
> that is privileged, confidential or otherwise exempt
> from disclosure under applicable law.
> 
> If you are not the intended recipient or agent
> thereof responsible for delivering this message to
> the intended recipient, you are hereby notified that
> any dissemination, distribution or copying of this
> communication is strictly prohibited.
> 
> If you have received this communication in error,
> please notify the sender immediately by telephone
> and with a 'reply' message.
> 
> Thank you for your co-operation.
> 
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster@xxxxxxxxxx
>
https://www.redhat.com/mailman/listinfo/linux-cluster
> 

      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

--
Linux-cluster mailing list
Linux-cluster@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/linux-cluster