--- Andreas Dilger <adilger@clusterfs.com> wrote: > On Jun 26, 2003 06:46 -0700, Dale wrote: > > I have a problem with an NFS server for my network. It has ran > kernels > > 2.4.18-ac4 - 2.4.21-ac1, all with problems. The -ac patches are > used > > to provide the new style quota support. The system seems to have > > gotten even less stable with the new kernel versions. > > > > This morning around 5 am, I got a page the system was unresponding > to > > NFS requests. I ssh'd in, and found the loadavg at ~50. Below are > > some snippets from ps at the time: > > > > root 3414 0.8 0.1 3904 3048 ? DN 04:02 1:45 > > /usr/bin/updatedb -f NFS,SMBFS,NCPFS,PROC,DEVPTS -e > /tmp,/var/tmp,/us > > root 3979 0.0 0.0 2588 1192 ? DN 04:14 0:00 > > /usr/bin/rsync -aH --delete /home/puser1 /home/puser2 /home/puser3 > > > > The rsync command is backing up across the network to a backup nfs > > server. updatedb starts at 4:02 am, and the rsync had been running > > since 3:30 and was half-way completed (estimated by the 'p' in the > > uername). > > > > Also there were 32 nfsd's just like this: > > root 851 0.0 0.0 0 0 ? DW Jun19 4:35 [nfsd] > > > > and these, the other 4 kjournald's were in SW. > > root 7 0.1 0.0 0 0 ? DW Jun19 17:04 [kswapd] > > root 144 0.0 0.0 0 0 ? DW Jun19 6:53 [kjournald] > > > > I'm wondering what my options are, this has happened ~10 times in > the > > last 6 months, although the system went a period of ~120 days > without a > > hiccup. This last time on 2.4.21-ac1 was only 6 days. > > It wouldn't be so bad if a `shutdown -r now` would restart it, but > it > > hangs while shutting down nfs and during killall and needs hard > > rebooted. > > This almost certainly is a lock deadlock of some sort. I've had > pretty > good luck in debugging such problems just by running "sysrq-T" on the > console and/or using "crash" to examine the running kernel. This > needs > a fair amount of knowledge of the various locks in ext3. The most > common problems are related to lock ordering problems with some > process > starting a journal transaction and then blocking on a lock (e.g. > directory > or inode semaphore, or superblock lock), and some other process > holding > that lock and trying to start a new transaction when the journal is > full. > > The journal being full is a crucial issue, because if it isn't full > you > can start a new transaction without problems, but when it is full you > need > to flush the journal and wait for all existing users to free up their > handles, > which will never happen if the first process has a transaction handle > and is > blocked waiting for a lock the second process is holding. > Thanks for the quick response Andreas, If you could provide a little more instruction it would be appriciated. I'm guessing magic sysrq is required and sysrq-T means ALT+PrintScreen+T? What kind of information does this provide and what should I do with it? Thanks, Dale > Cheers, Andreas > -- > Andreas Dilger > http://sourceforge.net/projects/ext2resize/ > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > __________________________________ Do you Yahoo!? SBC Yahoo! DSL - Now only $29.95 per month! http://sbc.yahoo.com _______________________________________________ Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users