On 2013年12月29日 00:19, Mark Tinguely wrote: > On 12/24/13 06:48, Jeff Liu wrote: >> From: Jie Liu<jeff.liu@xxxxxxxxxx> >> >> With fsstress+godown test I observed an XFS hang up during umount which >> yielding a backtrace like below: >> >> [20876.193635] INFO: task umount:9853 blocked for more than 120 seconds. >> [20876.193641] Tainted: PF O 3.13.0-rc2+ #8 >> [20876.193643] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" >> disables this message. >> [20876.193645] umount D ffff88026f294440 0 9853 9372 >> <snip> >> [20876.193663] Call Trace: >> [20876.193672] [<ffffffff816f3829>] schedule+0x29/0x70 >> [20876.193701] [<ffffffffa0c4a3e9>] xfs_ail_push_all_sync+0xa9/0xe0 >> [xfs] >> [20876.193707] [<ffffffff810a83f0>] ? prepare_to_wait_event+0x100/0x100 >> [20876.193726] [<ffffffffa0be9df1>] xfs_unmountfs+0x61/0x150 [xfs] >> [20876.193746] [<ffffffffa0becd41>] xfs_fs_put_super+0x21/0x60 [xfs] >> [20876.193751] [<ffffffff811bbf62>] generic_shutdown_super+0x72/0xf0 >> [20876.193754] [<ffffffff811bc217>] kill_block_super+0x27/0x70 >> [20876.193757] [<ffffffff811bc4fd>] deactivate_locked_super+0x3d/0x60 >> [20876.193761] [<ffffffff811bcab6>] deactivate_super+0x46/0x60 >> [20876.193765] [<ffffffff811d9146>] mntput_no_expire+0xd6/0x170 >> [20876.193769] [<ffffffff811da64e>] SyS_umount+0x8e/0x100 >> [20876.193774] [<ffffffff816ffd6d>] system_call_fastpath+0x1a/0x1f >> >> As per above backtraces, the umount process is already scheduled out >> in xfs_ail_push_all_sync() because it should push out all of pending >> changes in AIL and wait until the AIL is empty. Then it will wake up >> xfsaild thread to do the actual flushing business. However, I found >> that the AIL does not became empty in some situations because of some >> EFI are still being on it, but in EFI's iop_push operation, we always >> returning XFS_ITEM_PINNED which leads to the xfsaild thread suffering >> into an infinite loop. >> >> Since EFI items have no locking or pushing, they are pulled from the >> AIL when their corresponding EFDs are committed to disk, and we have >> guaranteed that the EFI should not be freed until it has been unppined >> and the EFD has been committed in commit 666d644cd7, this is done via >> an EFI reference count by initializing it to 2 in xfs_efi_init() -- one >> is it's own count which is not released until it is unpinned, the other >> one is taken by its corresponding EFD which will be released during EFD >> commit operation. >> >> IMHO we should always releasing it's reference to the corresponding EFI >> item once the EFD item is committed to disk regardless of the log item >> is marked with XFS_LI_ABORTED flag or not. >> >> Signed-off-by: Jie Liu<jeff.liu@xxxxxxxxxx> >> --- >> fs/xfs/xfs_extfree_item.c | 8 +------- >> 1 file changed, 1 insertion(+), 7 deletions(-) >> >> diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c >> index 3680d04..16c0396 100644 >> --- a/fs/xfs/xfs_extfree_item.c >> +++ b/fs/xfs/xfs_extfree_item.c >> @@ -437,13 +437,7 @@ xfs_efd_item_committed( >> { >> struct xfs_efd_log_item *efdp = EFD_ITEM(lip); >> >> - /* >> - * If we got a log I/O error, it's always the case that the LR >> with the >> - * EFI got unpinned and freed before the EFD got aborted. >> - */ >> - if (!(lip->li_flags& XFS_LI_ABORTED)) >> - xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); >> - >> + xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents); >> xfs_efd_item_free(efdp); >> return (xfs_lsn_t)-1; >> } > Hi Mark, > Hi Jeff. > > This would work if the forced shutdown happened after both the EFI and > EFD transaction were committed and successfully placed on the CIL. Yep. > If the sequence went EFI commit, CIL push (EFI is now in the AIL), > forced shutdown, and then EFD commit. In this sequence, the EFD item > would not be placed on the CIL and therefore the iop.committed would not > be called. In this patch only iop_committing and iop_unlock would be run > on the EFD item. Agree, it seems we have to release EFI reference count in xfs_efd_item_unlock() if XFS_LI_ABORTED is detected. However, the story was not yet completed, I just thought another possible order between EFI/EFD. If EFD is committed into CIL and just before calling xfs_efd_item_committed(), the EFI transaction commit is aborted and thus the EFI item is released in it's iop_unlock() callback, hence my current fix also has problem, i.e, in iop_committed(), it will try to drop an EFI reference count which is already released. This problem seems a bit complicated than I thought before, let me think it over once back from vacations until 02, Jan -- no development environment on hand... Thanks, -Jeff _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs