Sorry for the several month delay, but the server had stopped "crashing" for this period of time so I hadn't fully reported back. I have changed the backup script and can now almost initiate the "crash" at will. I'm including the prior conversation for completeness. /home is a 325Gb partition on a hardware raid controller. The system is now running a vanilla 2.4.22 (vfs0 quotas are included now, no -ac needed). This is what the backup command was changed to: /bin/nice -n 19 /usr/bin/find /home/ -type d -mindepth 1 -maxdepth 1 -exec /usr/bin/rsync -aH {} backup@backupserver::backup/home \; This command locks almost daily. The system is still repsponsive, however no data maybe written to /home/ during this time. 'touch /home/tmp' just sits there... I hadn't been able to get to the system in the past, but now that I can reproduce the crash almost at will, I'll be more able to test the situation. Thanks for your prior and current help, looking forward to hearing for suggestions on how I can track down what's locking and find a solution. Thanks, Dale --- Andreas Dilger <adilger@xxxxxxxxxxxxx> wrote: > On Jun 26, 2003 06:46 -0700, Dale wrote: > > I have a problem with an NFS server for my network. It has ran > kernels > > 2.4.18-ac4 - 2.4.21-ac1, all with problems. The -ac patches are > used > > to provide the new style quota support. The system seems to have > > gotten even less stable with the new kernel versions. > > > > This morning around 5 am, I got a page the system was unresponding > to > > NFS requests. I ssh'd in, and found the loadavg at ~50. Below are > > some snippets from ps at the time: > > > > root 3414 0.8 0.1 3904 3048 ? DN 04:02 1:45 > > /usr/bin/updatedb -f NFS,SMBFS,NCPFS,PROC,DEVPTS -e > /tmp,/var/tmp,/us > > root 3979 0.0 0.0 2588 1192 ? DN 04:14 0:00 > > /usr/bin/rsync -aH --delete /home/puser1 /home/puser2 /home/puser3 > > > > The rsync command is backing up across the network to a backup nfs > > server. updatedb starts at 4:02 am, and the rsync had been running > > since 3:30 and was half-way completed (estimated by the 'p' in the > > uername). > > > > Also there were 32 nfsd's just like this: > > root 851 0.0 0.0 0 0 ? DW Jun19 4:35 [nfsd] > > > > and these, the other 4 kjournald's were in SW. > > root 7 0.1 0.0 0 0 ? DW Jun19 17:04 [kswapd] > > root 144 0.0 0.0 0 0 ? DW Jun19 6:53 [kjournald] > > > > I'm wondering what my options are, this has happened ~10 times in > the > > last 6 months, although the system went a period of ~120 days > without a > > hiccup. This last time on 2.4.21-ac1 was only 6 days. > > It wouldn't be so bad if a `shutdown -r now` would restart it, but > it > > hangs while shutting down nfs and during killall and needs hard > > rebooted. > > This almost certainly is a lock deadlock of some sort. I've had > pretty > good luck in debugging such problems just by running "sysrq-T" on the > console and/or using "crash" to examine the running kernel. This > needs > a fair amount of knowledge of the various locks in ext3. The most > common problems are related to lock ordering problems with some > process > starting a journal transaction and then blocking on a lock (e.g. > directory > or inode semaphore, or superblock lock), and some other process > holding > that lock and trying to start a new transaction when the journal is > full. > > The journal being full is a crucial issue, because if it isn't full > you > can start a new transaction without problems, but when it is full you > need > to flush the journal and wait for all existing users to free up their > handles, > which will never happen if the first process has a transaction handle > and is > blocked waiting for a lock the second process is holding. > > Cheers, Andreas > -- > Andreas Dilger > http://sourceforge.net/projects/ext2resize/ > http://www-mddsp.enel.ucalgary.ca/People/adilger/ > __________________________________ Do you Yahoo!? The New Yahoo! Shopping - with improved product search http://shopping.yahoo.com _______________________________________________ Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users