Re: ext3 with quota under heavy load.

Dale <lnxus@yahoo.com> · Thu, 26 Jun 2003 12:19:50 -0700 (PDT)

--- Andreas Dilger <adilger@clusterfs.com> wrote:
> On Jun 26, 2003  06:46 -0700, Dale wrote:
> > I have a problem with an NFS server for my network.  It has ran
> kernels
> > 2.4.18-ac4 - 2.4.21-ac1, all with problems.  The -ac patches are
> used
> > to provide the new style quota support.  The system seems to have
> > gotten even less stable with the new kernel versions.
> > 
> > This morning around 5 am, I got a page the system was unresponding
> to
> > NFS requests.  I ssh'd in, and found the loadavg at ~50.  Below are
> > some snippets from ps at the time:
> > 
> > root      3414  0.8  0.1  3904 3048 ?        DN   04:02   1:45
> > /usr/bin/updatedb -f NFS,SMBFS,NCPFS,PROC,DEVPTS -e
> /tmp,/var/tmp,/us
> > root      3979  0.0  0.0  2588 1192 ?        DN   04:14   0:00
> > /usr/bin/rsync -aH --delete /home/puser1 /home/puser2 /home/puser3
> > 
> > The rsync command is backing up across the network to a backup nfs
> > server.  updatedb starts at 4:02 am, and the rsync had been running
> > since 3:30 and was half-way completed (estimated by the 'p' in the
> > uername).
> > 
> > Also there were 32 nfsd's just like this:
> > root  851  0.0  0.0   0    0 ?    DW   Jun19   4:35 [nfsd]
> > 
> > and these, the other 4 kjournald's were in SW.
> > root   7  0.1  0.0   0    0 ?     DW   Jun19  17:04 [kswapd]
> > root 144  0.0  0.0   0    0 ?     DW   Jun19   6:53 [kjournald]
> > 
> > I'm wondering what my options are, this has happened ~10 times in
> the
> > last 6 months, although the system went a period of ~120 days
> without a
> > hiccup.  This last time on 2.4.21-ac1 was only 6 days.
> > It wouldn't be so bad if a `shutdown -r now` would restart it, but
> it
> > hangs while shutting down nfs and during killall and needs hard
> > rebooted.
> 
> This almost certainly is a lock deadlock of some sort.  I've had
> pretty
> good luck in debugging such problems just by running "sysrq-T" on the
> console and/or using "crash" to examine the running kernel.  This
> needs
> a fair amount of knowledge of the various locks in ext3.  The most
> common problems are related to lock ordering problems with some
> process
> starting a journal transaction and then blocking on a lock (e.g.
> directory
> or inode semaphore, or superblock lock), and some other process
> holding
> that lock and trying to start a new transaction when the journal is
> full.
> 
> The journal being full is a crucial issue, because if it isn't full
> you
> can start a new transaction without problems, but when it is full you
> need
> to flush the journal and wait for all existing users to free up their
> handles,
> which will never happen if the first process has a transaction handle
> and is
> blocked waiting for a lock the second process is holding.
> 

Thanks for the quick response Andreas,

If you could provide a little more instruction it would be appriciated.
I'm guessing magic sysrq is required and sysrq-T means
ALT+PrintScreen+T?  What kind of information does this provide and what
should I do with it?

Thanks,

Dale

> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> 

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

_______________________________________________

Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users