On Jun 26, 2003 12:19 -0700, Dale wrote: > --- Andreas Dilger <adilger@clusterfs.com> wrote: > > This almost certainly is a lock deadlock of some sort. I've had > > pretty good luck in debugging such problems just by running "sysrq-T" > > on the console and/or using "crash" to examine the running kernel. This > > needs a fair amount of knowledge of the various locks in ext3. The most > > common problems are related to lock ordering problems with some process > > starting a journal transaction and then blocking on a lock (e.g. directory > > or inode semaphore, or superblock lock), and some other process holding > > that lock and trying to start a new transaction when the journal is full. > > > > The journal being full is a crucial issue, because if it isn't full you > > can start a new transaction without problems, but when it is full you > > need to flush the journal and wait for all existing users to free up > > their handles, which will never happen if the first process has a > > transaction handle and is blocked waiting for a lock the second process > > is holding. > > If you could provide a little more instruction it would be appriciated. > I'm guessing magic sysrq is required and sysrq-T means ALT+PrintScreen+T? Correct. You can also use the "crash" tool (based on GDB) to get this information, but I'm not sure whether it requires kernel patches in order to work properly. > What kind of information does this provide and what should I do with it? This gives you a stack dump of all of the processes currently on the system to the console. You need to do this while you are experiencing the lockup, obviously. Unless you have in-kernel symbol decoding, you will also need to run the output through ksymoops in order to get anything meaningful from it. Interesting processes would include kswapd, kupdated, kjournald, and any of the other hanging processes, although there will likely be a lot of "secondary casualties" from the original deadlock. You should be able to see which processes are deadlocked by running "ps auxww" and looking for those stuck in disk wait "D" in the STAT column. At that point, probably one process will be in __down_failed(), and a bunch of others will be in start_this_handle() or similar, and kjournald will be waiting on the journal to be cleared. Cheers, Andreas -- Andreas Dilger http://sourceforge.net/projects/ext2resize/ http://www-mddsp.enel.ucalgary.ca/People/adilger/ _______________________________________________ Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users