On Tue, Apr 29, 2014 at 03:47:15PM -0600, Bob Mastors wrote: > On Mon, Apr 28, 2014 at 7:01 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote: > > > On Mon, Apr 28, 2014 at 05:51:31PM -0600, Bob Mastors wrote: > > > Log output attached. > > > The xfs filesystem being mounted and unmounted is the only xfs filesystem > > > on the system. > > > Bob > > > > > > > > > On Mon, Apr 28, 2014 at 5:45 PM, Dave Chinner <david@xxxxxxxxxxxxx> > > wrote: > > > > > > > On Mon, Apr 28, 2014 at 04:29:02PM -0600, Bob Mastors wrote: > > > > > Greetings, > > > > > > > > > > I have an xfs umount hang caused by forcing the block device to > > return > > > > > i/o errors while copying files to the filesystem. > > > > > Detailed steps to reproduce the problem on virtualbox are below. > > > > > > > > > > The linux version is a recent pull and reports as 3.15.0-rc3. > > > > > > > > > > [ 2040.248096] INFO: task umount:10303 blocked for more than 120 > > seconds. > > > > > [ 2040.323947] Not tainted 3.15.0-rc3 #4 > > > > > [ 2040.343423] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > > > disables > > > > > this message. > > > > > [ 2040.352665] umount D ffffffff8180fe40 0 10303 8691 > > > > > 0x00000000 > > > > > [ 2040.404918] ffff88001e33dd58 0000000000000086 ffff88001e33dd48 > > > > > ffffffff81080f82 > > > > > [ 2040.489901] ffff88001b311900 0000000000013180 ffff88001e33dfd8 > > > > > 0000000000013180 > > > > > [ 2040.534772] ffff88003daa3200 ffff88001b311900 ffff88002421aec0 > > > > > ffff88002421ae80 > > > > > [ 2040.587450] Call Trace: > > > > > [ 2040.592176] [<ffffffff81080f82>] ? try_to_wake_up+0x232/0x2b0 > > > > > [ 2040.620212] [<ffffffff816c54d9>] schedule+0x29/0x70 > > > > > [ 2040.627685] [<ffffffffa04c4cd6>] xfs_ail_push_all_sync+0x96/0xd0 > > > > [xfs] > > > > > [ 2040.632236] [<ffffffff81092230>] ? __wake_up_sync+0x20/0x20 > > > > > [ 2040.659105] [<ffffffffa04731a3>] xfs_unmountfs+0x63/0x160 [xfs] > > > > > [ 2040.691774] [<ffffffffa0478f65>] ? kmem_free+0x35/0x40 [xfs] > > > > > [ 2040.698610] [<ffffffffa0474cf5>] xfs_fs_put_super+0x25/0x60 [xfs] > > > > > [ 2040.706838] [<ffffffff8119561e>] > > generic_shutdown_super+0x7e/0x100 > > > > > [ 2040.723958] [<ffffffff811956d0>] kill_block_super+0x30/0x80 > > > > > [ 2040.734963] [<ffffffff8119591d>] > > deactivate_locked_super+0x4d/0x80 > > > > > [ 2040.745485] [<ffffffff8119652e>] deactivate_super+0x4e/0x70 > > > > > [ 2040.751274] [<ffffffff811b1d42>] mntput_no_expire+0xd2/0x160 > > > > > [ 2040.755894] [<ffffffff811b2fff>] SyS_umount+0xaf/0x3b0 > > > > > [ 2040.761032] [<ffffffff816d1592>] system_call_fastpath+0x16/0x1b > > > > > [ .060058] XFS (sdb): xfs_log_force: error 5 returned. > > > > > [ 268059] XFS (sdb): xfs_log_force: error 5 returned. > > > > > > > > > > I took a look at xfs_ail_push_all_sync and it is pretty easy to see > > > > > the hang. But it is not obvious to me how to fix it. > > > > > Any ideas would be appreciated. > > > > > > > > > > I am available to run additional tests or capture more logging > > > > > or whatever if that would help. > > > > > > > > What's the entire log output from the first shutdown message? > > > > So what is the AIL stuck on? Can you trace the xfs_ail* trace points > > when it is in shutdown like this and post the output of the report? > > > > > # tracer: function > # > # entries-in-buffer/entries-written: 32/32 #P:4 > # > # _-----=> irqs-off > # / _----=> need-resched > # | / _---=> hardirq/softirq > # || / _--=> preempt-depth > # ||| / delay > # TASK-PID CPU# |||| TIMESTAMP FUNCTION > # | | | |||| | | > umount-2995 [003] .... 1125.524090: xfs_unmountfs <-xfs_fs_put_super > umount-2995 [003] .... 1125.524099: xfs_ail_push_all_sync <-xfs_unmountfs > umount-2995 [003] .... 1125.527498: xfs_ail_push_all_sync <-xfs_log_quiesce > kswapd0-43 [002] .... 1148.078330: xfs_ail_push_all <-xfs_reclaim_inodes_nr > kswapd0-43 [002] .... 1148.078332: xfs_ail_push <-xfs_ail_push_all > xfsaild/sdb-3005 [002] .... 1148.128311: xfs_ail_min_lsn <-xfsaild > kswapd0-43 [001] .... 1149.971684: xfs_ail_push_all <-xfs_reclaim_inodes_nr > kswapd0-43 [001] .... 1149.971695: xfs_ail_push <-xfs_ail_push_all > xfsaild/sdb-3005 [003] .... 1150.023104: xfs_ail_min_lsn <-xfsaild > umount-3013 [000] .N.. 1153.035350: xfs_unmountfs <-xfs_fs_put_super > umount-3013 [000] .... 1153.093488: xfs_ail_push_all_sync <-xfs_unmountfs > umount-3013 [000] .... 1153.122459: xfs_ail_push_all_sync <-xfs_log_quiesce > kswapd0-43 [003] .... 1186.232408: xfs_ail_push_all <-xfs_reclaim_inodes_nr > kswapd0-43 [003] .... 1186.232411: xfs_ail_push <-xfs_ail_push_all > xfsaild/sdb-3035 [001] .... 1186.288258: xfs_ail_min_lsn <-xfsaild > umount-3043 [001] .... 1188.653981: xfs_unmountfs <-xfs_fs_put_super > umount-3043 [001] .... 1188.653991: xfs_ail_push_all_sync <-xfs_unmountfs > kworker/0:0-2865 [000] .... 1199.903174: xfs_ail_min_lsn <-xfs_log_need_covered > kworker/0:0-2865 [000] .... 1199.903179: xfs_ail_push_all <-xfs_log_worker > kworker/0:0-2865 [000] .... 1199.903180: xfs_ail_push <-xfs_ail_push_all > kworker/0:0-2865 [000] .... 1229.837372: xfs_ail_min_lsn <-xfs_log_need_covered > kworker/0:0-2865 [000] .... 1229.837377: xfs_ail_push_all <-xfs_log_worker > kworker/0:0-2865 [000] .... 1229.837378: xfs_ail_push <-xfs_ail_push_all > kworker/0:0-2865 [000] .... 1259.835355: xfs_ail_min_lsn <-xfs_log_need_covered > kworker/0:0-2865 [000] .... 1259.835360: xfs_ail_push_all <-xfs_log_worker > kworker/0:0-2865 [000] .... 1259.835360: xfs_ail_push <-xfs_ail_push_all > kworker/0:0-2865 [000] .... 1289.834315: xfs_ail_min_lsn <-xfs_log_need_covered > kworker/0:0-2865 [000] .... 1289.834320: xfs_ail_push_all <-xfs_log_worker > kworker/0:0-2865 [000] .... 1289.834321: xfs_ail_push <-xfs_ail_push_all > kworker/0:0-2865 [000] .... 1319.831564: xfs_ail_min_lsn <-xfs_log_need_covered > kworker/0:0-2865 [000] .... 1319.831569: xfs_ail_push_all <-xfs_log_worker > kworker/0:0-2865 [000] .... 1319.831570: xfs_ail_push <-xfs_ail_push_all > > The first two xfs_unmountfs in the above log worked fine, no hang. > The third one hung. Well, I guess I should been more specific. ftrace is pretty much useless here - we need the information taht is recorded by the events, not the timing of the events. # trace-cmd record -e xfs_ail* .... ^C .... # trace-cmd report version = 6 cpus=1 kworker/0:1H-1269 [000] 104529.901271: xfs_ail_insert: dev 253:16 lip 0x0xffff880017a4b740 old lsn 0/0 new lsn 1/20 type XFS_LI_BUF flags IN_AIL kworker/0:1H-1269 [000] 104529.901273: xfs_ail_insert: dev 253:16 lip 0x0xffff880017a4bbc8 old lsn 0/0 new lsn 1/20 type XFS_LI_BUF flags IN_AIL kworker/0:1H-1269 [000] 104529.901273: xfs_ail_insert: dev 253:16 lip 0x0xffff88001b6f8098 old lsn 0/0 new lsn 1/20 type XFS_LI_INODE flags IN_AIL kworker/0:1H-1269 [000] 104529.901274: xfs_ail_insert: dev 253:16 lip 0x0xffff880027600c80 old lsn 0/0 new lsn 1/20 type XFS_LI_EFI flags IN_AIL kworker/0:1H-1269 [000] 104529.901274: xfs_ail_insert: dev 253:16 lip 0x0xffff88001b6f81c8 old lsn 0/0 new lsn 1/20 type XFS_LI_INODE flags IN_AIL kworker/0:1H-1269 [000] 104529.901274: xfs_ail_insert: dev 253:16 lip 0x0xffff88001b6f8130 old lsn 0/0 new lsn 1/20 type XFS_LI_INODE flags IN_AIL kworker/0:1H-1269 [000] 104529.901274: xfs_ail_insert: dev 253:16 lip 0x0xffff880017a4b488 old lsn 0/0 new lsn 1/20 type XFS_LI_BUF flags IN_AIL kworker/0:1H-1269 [000] 104529.901275: xfs_ail_insert: dev 253:16 lip 0x0xffff880017a4bcb0 old lsn 0/0 new lsn 1/20 type XFS_LI_BUF flags IN_AIL kworker/0:1H-1269 [000] 104529.901275: xfs_ail_insert: dev 253:16 lip 0x0xffff880017a4b910 old lsn 0/0 new lsn 1/20 type XFS_LI_BUF flags IN_AIL kworker/0:1H-1269 [000] 104529.901275: xfs_ail_insert: dev 253:16 lip 0x0xffff88001b6f8000 old lsn 0/0 new lsn 1/20 type XFS_LI_INODE flags IN_AIL kworker/0:1H-1269 [000] 104529.901277: xfs_ail_delete: dev 253:16 lip 0x0xffff880027600c80 old lsn 1/20 new lsn 1/20 type XFS_LI_EFI flags IN_AIL xfsaild/vdb-14828 [000] 104529.901338: xfs_ail_push: dev 253:16 lip 0x0xffff88001b6f8000 lsn 1/20 type XFS_LI_INODE flags IN_AIL xfsaild/vdb-14828 [000] 104529.901339: xfs_ail_push: dev 253:16 lip 0x0xffff880017a4b910 lsn 1/20 type XFS_LI_BUF flags IN_AIL xfsaild/vdb-14828 [000] 104529.901339: xfs_ail_push: dev 253:16 lip 0x0xffff880017a4bcb0 lsn 1/20 type XFS_LI_BUF flags IN_AIL xfsaild/vdb-14828 [000] 104529.901340: xfs_ail_push: dev 253:16 lip 0x0xffff880017a4b488 lsn 1/20 type XFS_LI_BUF flags IN_AIL ..... > [ 240.930278] xfsaild/sdb D ffffffff8180fe40 0 5640 2 0x00000000 > [ 240.931408] ffff88003cd5dc58 0000000000000046 ffff88003cd5dc68 ffffffff81069029 > [ 240.933116] ffff88003cdc8000 0000000000013180 ffff88003cd5dfd8 0000000000013180 > [ 240.934655] ffff88003daa1900 ffff88003cdc8000 ffff88003cd5dc28 0000000000000011 > [ 240.936328] Call Trace: > [ 240.936862] [<ffffffff81069029>] ? flush_work+0x139/0x1e0 > [ 240.937755] [<ffffffff816c54d9>] schedule+0x29/0x70 > [ 240.938532] [<ffffffffa04be2b2>] xlog_cil_force_lsn+0x192/0x200 [xfs] > [ 240.939452] [<ffffffff81081000>] ? try_to_wake_up+0x2b0/0x2b0 > [ 240.940366] [<ffffffffa04bc4ad>] _xfs_log_force+0x6d/0x2b0 [xfs] > [ 240.941302] [<ffffffffa04bc729>] xfs_log_force+0x39/0xc0 [xfs] > [ 240.942174] [<ffffffffa04c1567>] xfsaild+0x137/0x6e0 [xfs] > [ 240.943008] [<ffffffffa04c1430>] ? xfs_trans_ail_cursor_first+0xa0/0xa0 [xfs] > [ 240.944240] [<ffffffff81071ae9>] kthread+0xc9/0xe0 > [ 240.945001] [<ffffffff81071a20>] ? flush_kthread_worker+0xb0/0xb0 > [ 240.945884] [<ffffffff816d14ec>] ret_from_fork+0x7c/0xb0 > [ 240.946696] [<ffffffff81071a20>] ? flush_kthread_worker+0xb0/0xb0 Try the patch here: http://oss.sgi.com/pipermail/xfs/2014-April/035870.html Needs a minor fix to compile as noted here: http://oss.sgi.com/pipermail/xfs/2014-April/035878.html Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs