On Mon, May 21, 2012 at 09:59:03AM +1000, Dave Chinner wrote: > You need to provide the output of sysrq-W at this point ('echo w > > /proc/sysrq-trigger') so we can see where these are hung. the entire > dmesg would also be useful.... Thank you for this advice Dave. Attached is the full dmesg output after another hang. The sysrq output is near the end, at timestamp 250695. For this test, I built a fresh XFS filesystem (you can see this at timestamp 246909) - I forgot to mount with "inode64" this time, but it doesn't seem to have made a difference. I also did "swapoff" before starting the test, to ensure that swapping to sda was not part of the problem. A quick status summary from the hung system: root@storage3:~# free total used free shared buffers cached Mem: 8156224 8052824 103400 0 4808 7399112 -/+ buffers/cache: 648904 7507320 Swap: 0 0 0 root@storage3:~# uptime 10:46:47 up 2 days, 21:43, 1 user, load average: 10.00, 9.81, 8.16 root@storage3:~# ps auxwww | grep -v ' S' root 34 2.9 0.0 0 0 ? D May18 122:14 [kswapd0] root 1387 0.0 0.0 15976 504 ? Ds May18 0:18 /usr/sbin/irqbalance root 5242 0.0 0.0 0 0 ? D 09:39 0:02 [xfsaild/md127] tomi 6249 4.2 0.0 378860 3844 pts/1 D+ 09:40 2:48 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000 tomi 6251 4.1 0.0 378860 3836 pts/2 D+ 09:40 2:44 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000 tomi 6253 4.1 0.0 378860 3848 pts/3 D+ 09:40 2:46 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000 tomi 6255 4.0 0.0 378860 3840 pts/4 D+ 09:40 2:40 bonnie++ -d /disk/scratch/test -s 16384k -n 98:800k:500k:1000 root 7795 0.1 0.0 0 0 ? D 10:27 0:02 [kworker/0:3] root 8517 0.0 0.0 16876 1272 pts/0 R+ 10:46 0:00 ps auxwww root 24420 0.0 0.0 0 0 ? D 00:50 0:00 [kworker/3:0] root@storage3:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : active raid0 sds[17] sdx[22] sdj[8] sdt[18] sdk[9] sdc[1] sdb[0] sdh[6] sdu[19] sdi[7] sdn[12] sdo[13] sdv[20] sdm[11] sdq[15] sdp[14] sdl[10] sdw[21] sdg[5] sdr[16] sde[3] sdy[23] sdd[2] sdf[4] 70326362112 blocks super 1.2 1024k chunks unused devices: <none> root@storage3:~# mount /dev/sda1 on / type ext4 (rw,errors=remount-ro) proc on /proc type proc (rw,noexec,nosuid,nodev) sysfs on /sys type sysfs (rw,noexec,nosuid,nodev) none on /sys/fs/fuse/connections type fusectl (rw) none on /sys/kernel/debug type debugfs (rw) none on /sys/kernel/security type securityfs (rw) udev on /dev type devtmpfs (rw,mode=0755) devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620) tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755) none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880) none on /run/shm type tmpfs (rw,nosuid,nodev) rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw) /dev/md127 on /disk/scratch type xfs (rw) root@storage3:~# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda1 967415188 16754388 902241696 2% / udev 4069104 4 4069100 1% /dev tmpfs 1631248 380 1630868 1% /run none 5120 0 5120 0% /run/lock none 4078112 0 4078112 0% /run/shm /dev/md127 70324275200 258902416 70065372784 1% /disk/scratch root@storage3:~# (Aside: when the test started the load average was just above 4, for the four bonnie++ processes) "iostat 5" shows zero activity to the MD RAID. avg-cpu: %user %nice %system %iowait %steal %idle 0.00 0.00 0.00 0.05 0.00 99.95 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn sda 0.20 1.60 0.00 8 0 sdf 0.00 0.00 0.00 0 0 sde 0.00 0.00 0.00 0 0 sdd 0.00 0.00 0.00 0 0 sdc 0.00 0.00 0.00 0 0 sdg 0.00 0.00 0.00 0 0 sdh 0.00 0.00 0.00 0 0 sdp 0.00 0.00 0.00 0 0 sdj 0.00 0.00 0.00 0 0 sdq 0.00 0.00 0.00 0 0 sdk 0.00 0.00 0.00 0 0 sdb 0.00 0.00 0.00 0 0 sdl 0.00 0.00 0.00 0 0 sdo 0.00 0.00 0.00 0 0 sdm 0.00 0.00 0.00 0 0 sdn 0.00 0.00 0.00 0 0 sdi 0.00 0.00 0.00 0 0 sdr 0.00 0.00 0.00 0 0 sdu 0.00 0.00 0.00 0 0 sdv 0.00 0.00 0.00 0 0 sdw 0.00 0.00 0.00 0 0 sdy 0.00 0.00 0.00 0 0 sdx 0.00 0.00 0.00 0 0 sds 0.00 0.00 0.00 0 0 sdt 0.00 0.00 0.00 0 0 md127 0.00 0.00 0.00 0 0 Anything you can determine from this info much appreciated! Regards, Brian.
Attachment:
storage3-dmesg.txt.gz
Description: application/gunzip
_______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs