kjournald blocked in D state

Mike Miller <mike.miller@xxxxxx> · Thu, 17 Jun 2010 11:08:14 -0500

I have a system on which kjournald becomes blocked in D state quite often.
Looking at a core file we have 5 mounted ext3 filesystems:

crash> mount
    VFSMOUNT         SUPERBLK     TYPE   DEVNAME           DIRNAME
     10037e07b00      10037e4ec00 rootfs rootfs            /         
     10037e07ec0      10037e4e400 proc   /proc             /proc     
     10037e07d40      102188abc00 tmpfs  none              /dev      
     10037e07e00      102188b2400 ext3   /dev/root         /         
     10037e07200      102188abc00 tmpfs  none              /dev      
     10037e07140      10037e4e400 proc   /proc             /proc     
     1021652bc00      102188b1c00 usbfs  /proc/bus/usb     /proc/bus/usb
     1021652bf00      10037e4c400 sysfs  /sys              /sys      
     1021652bb40      10006967400 devpts devpts            /dev/pts  
     1021652b180      100dfeda400 ext3   /dev/cciss/c0d0p1 /boot     
     1021652b240      100dfecb800 ext3   /dev/sys/home     /home     
     1021652b300      100dfecbc00 ext3   /dev/sys/tmp      /tmp      
     1021652b3c0      100dfeda800 ext3   /dev/sys/var      /var      
     1021652b480      100dfedac00 tmpfs  tmpfs             /dev/shm  
     1021652bcc0      100dfecb400 binfmt_misc none         /proc/sys/fs/binfmt_misc

So we have 5 corresponding journal threads:

crash> ps | grep kjournald
    626      1   2     10218109030    IN   0.0       0      0  [kjournald]
   3015      1   0     102168f2030    IN   0.0       0      0  [kjournald]
   3016      1   1     102168f27f0    UN   0.0       0      0  [kjournald]
   3017      1   1     1021837b030    IN   0.0       0      0  [kjournald]
   3018      1   7     10216fd0030    UN   0.0       0      0  [kjournald]

2 are in the UNITERRUPTIBLE state. But only PID 3018 shows __wait_on_buffer
in its stack:

crash> bt -f 3018
PID: 3018   TASK: 10216fd0030       CPU: 7   COMMAND: "kjournald"
-----snip-----
 #2 [10215a83b30] __wait_on_buffer at ffffffff8017d504
    10215a83b38: 000001005fa12ce8 0000000000000000 
    10215a83b48: 0000010216fd0030 ffffffff8017d38a 
    10215a83b58: 0000010215a83b88 0000010215a83b88 
    10215a83b68: 000001005fa12ce8 0000000000000000 
    10215a83b78: 0000010216fd0030 ffffffff8017d38a 
    10215a83b88: ffffffff804ac808 ffffffff804ac808 
    10215a83b98: 000001005fa12ce8 0000000000000001 
    10215a83ba8: 000001004f4e90e0 ffffffffa0080ffe 
-----snip-----

I'm not a crash expert so I then looked the last address pushed onto its
stack and traced down to the inode semaphore:

crash> struct file.f_dentry 000001005fa12ce8
  f_dentry = 0x1021f4e5510, 
crash> struct dentry.d_inode 0x1021f4e5510
  d_inode = 0x100c95c17c0, 
crash> struct inode.i_sem 0x100c95c17c0
  i_sem = {
    count = {
      counter = -916711312 <-------------------- This looks wrong
    }, 
    sleepers = 256, 
    wait = {
      lock = {
        lock = 497690456, 
        magic = 258
      }, 
      task_list = {
        next = 0x100000000000, <--------------- This also looks wrong 
        prev = 0x30f75c3
      }
    }
  }, 

At this point I'm not sure how to continue or even if I went down the right
path. From this info can anyone tell what's wrong? Or did I not go down the
patch to reach this conclusion.

-- mikem 
In this case /home is a heavily accessed filesystem. 

_______________________________________________
Ext3-users mailing list
Ext3-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/ext3-users