Re: Storage server, hung tasks and tracebacks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote:
> You need to provide the output of sysrq-W at this point ('echo w >
> /proc/sysrq-trigger') so we can see where these are hung. the entire
> dmesg would also be useful....

Thank you for this advice Dave.

Attached is the full dmesg output after another hang. The sysrq output is
near the end, at timestamp 250695.

For this test, I built a fresh XFS filesystem (you can see this at timestamp
246909) - I forgot to mount with "inode64" this time, but it doesn't seem to
have made a difference.  I also did "swapoff" before starting the test, to
ensure that swapping to sda was not part of the problem.

A quick status summary from the hung system:

    root@storage3:~# free
                 total       used       free     shared    buffers     cached
    Mem:       8156224    8052824     103400          0       4808    7399112
    -/+ buffers/cache:     648904    7507320
    Swap:            0          0          0
    root@storage3:~# uptime
     10:46:47 up 2 days, 21:43,  1 user,  load average: 10.00, 9.81, 8.16
    root@storage3:~# ps auxwww | grep -v ' S'
    root        34  2.9  0.0      0     0 ?        D    May18 122:14 [kswapd0]
    root      1387  0.0  0.0  15976   504 ?        Ds   May18   0:18 /usr/sbin/irqbalance
    root      5242  0.0  0.0      0     0 ?        D    09:39   0:02 [xfsaild/md127]
    tomi      6249  4.2  0.0 378860  3844 pts/1    D+   09:40   2:48 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6251  4.1  0.0 378860  3836 pts/2    D+   09:40   2:44 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6253  4.1  0.0 378860  3848 pts/3    D+   09:40   2:46 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    tomi      6255  4.0  0.0 378860  3840 pts/4    D+   09:40   2:40 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000
    root      7795  0.1  0.0      0     0 ?        D    10:27   0:02 [kworker/0:3]
    root      8517  0.0  0.0  16876  1272 pts/0    R+   10:46   0:00 ps auxwww
    root     24420  0.0  0.0      0     0 ?        D    00:50   0:00 [kworker/3:0]
    root@storage3:~# cat /proc/mdstat
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4]
    [raid10] 
    md127 : active raid0 sds[17] sdx[22] sdj[8] sdt[18] sdk[9] sdc[1] sdb[0]
    sdh[6] sdu[19] sdi[7] sdn[12] sdo[13] sdv[20] sdm[11] sdq[15] sdp[14]
    sdl[10] sdw[21] sdg[5] sdr[16] sde[3] sdy[23] sdd[2] sdf[4]
          70326362112 blocks super 1.2 1024k chunks
          
    unused devices: <none>
    root@storage3:~# mount
    /dev/sda1 on / type ext4 (rw,errors=remount-ro)
    proc on /proc type proc (rw,noexec,nosuid,nodev)
    sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
    none on /sys/fs/fuse/connections type fusectl (rw)
    none on /sys/kernel/debug type debugfs (rw)
    none on /sys/kernel/security type securityfs (rw)
    udev on /dev type devtmpfs (rw,mode=0755)
    devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
    tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
    none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
    none on /run/shm type tmpfs (rw,nosuid,nodev)
    rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
    /dev/md127 on /disk/scratch type xfs (rw)
    root@storage3:~# df
    Filesystem       1K-blocks      Used   Available Use% Mounted on
    /dev/sda1        967415188  16754388   902241696   2% /
    udev               4069104         4     4069100   1% /dev
    tmpfs              1631248       380     1630868   1% /run
    none                  5120         0        5120   0% /run/lock
    none               4078112         0     4078112   0% /run/shm
    /dev/md127     70324275200 258902416 70065372784   1% /disk/scratch
    root@storage3:~# 

(Aside: when the test started the load average was just above 4, for the
four bonnie++ processes)

"iostat 5" shows zero activity to the MD RAID.

    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               0.00    0.00    0.00    0.05    0.00   99.95

    Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
    sda               0.20         1.60         0.00          8          0
    sdf               0.00         0.00         0.00          0          0
    sde               0.00         0.00         0.00          0          0
    sdd               0.00         0.00         0.00          0          0
    sdc               0.00         0.00         0.00          0          0
    sdg               0.00         0.00         0.00          0          0
    sdh               0.00         0.00         0.00          0          0
    sdp               0.00         0.00         0.00          0          0
    sdj               0.00         0.00         0.00          0          0
    sdq               0.00         0.00         0.00          0          0
    sdk               0.00         0.00         0.00          0          0
    sdb               0.00         0.00         0.00          0          0
    sdl               0.00         0.00         0.00          0          0
    sdo               0.00         0.00         0.00          0          0
    sdm               0.00         0.00         0.00          0          0
    sdn               0.00         0.00         0.00          0          0
    sdi               0.00         0.00         0.00          0          0
    sdr               0.00         0.00         0.00          0          0
    sdu               0.00         0.00         0.00          0          0
    sdv               0.00         0.00         0.00          0          0
    sdw               0.00         0.00         0.00          0          0
    sdy               0.00         0.00         0.00          0          0
    sdx               0.00         0.00         0.00          0          0
    sds               0.00         0.00         0.00          0          0
    sdt               0.00         0.00         0.00          0          0
    md127             0.00         0.00         0.00          0          0

Anything you can determine from this info much appreciated!

Regards,

Brian.

Attachment: storage3-dmesg.txt.gz
Description: application/gunzip

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux