[PATCH RFC 2/2] xfs: fix efi/efd error handling to avoid fs shutdown hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Freeing an extent in XFS involves logging an EFI (extent free
intention), freeing the actual extent, and logging an EFD (extent free
done). The EFI object is created with a reference count of 2: one for
the current transaction and one for the subsequently created EFD. Under
normal circumstances, the first reference is dropped when the EFI is
unpinned and the second reference is dropped when the EFD is committed
to the on-disk log.

In event of errors or filesystem shutdown, there are various potential
cleanup scenarios depending on the state of the EFI/EFD. The cleanup
scenarios are confusing and racy, as demonstrated by the following test
sequence:

	# mount $dev $mnt
	# fsstress -d $mnt -n 99999 -p 16 -z -f fallocate=1 \
		-f punch=1 -f creat=1 -f unlink=1 &
	# sleep 5
	# killall -9 fsstress; wait
	# godown -f $mnt
	# umount

... in which the final umount can hang due to the AIL being pinned
indefinitely by one or more EFI items. This can occur due to several
conditions. For example, if the shutdown occurs after the EFI is
committed to the on-disk log and the EFD committed to the CIL, but
before the EFD committed to the log, the EFD iop_committed() abort
handler does not drop its reference to the EFI. Alternatively, manual
error injection in the xfs_bmap_finish() codepath shows that if an error
occurs after the EFI transaction is committed but before the EFD is
constructed and logged, the EFI is never released from the AIL.

Update the EFI/EFD item handling code to use a more straightforward and
reliable approach to error handling. If the EFI transaction is
cancelled, the EFI is freed when the log item is unlocked. If the EFI
transaction is committed successfully, from that point forward it is the
responsibility of the EFD to drop its EFI reference. This means that the
EFI unpin callback only ever drops the log reference to the EFI. It does
not free the EFI in the event of log I/O error. This also means that the
EFD item must drop its EFI reference either if the EFD transaction is
cancelled, committed or itself aborted due to log I/O error. Finally,
update xfs_bmap_finish() to ensure that once an EFI transaction is
committed, we are guaranteed to construct and log the associated EFD.
This ensures that the EFD is aborted and drops the reference to the EFI.

Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
---
 fs/xfs/xfs_bmap_util.c    | 35 ++++++++++++++++++++--------
 fs/xfs/xfs_extfree_item.c | 59 +++++++++++++++++++++++++----------------------
 2 files changed, 57 insertions(+), 37 deletions(-)

diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 0f34886c..878b33a 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -88,20 +88,36 @@ xfs_bmap_finish(
 		xfs_trans_log_efi_extent(*tp, efi, free->xbfi_startblock,
 			free->xbfi_blockcount);
 
-	error = xfs_trans_roll(tp, NULL);
-	*committed = 1;
+	error = __xfs_trans_roll(tp, NULL, committed);
+
 	/*
-	 * We have a new transaction, so we should return committed=1,
-	 * even though we're returning an error.
+	 * We have a new transaction, so we should return committed=1, even
+	 * though we're returning an error. If an error was returned after the
+	 * original transaction was committed, defer the error handling until
+	 * the EFD is logged. We do this because a committed EFI requires an EFD
+	 * transaction to be processed to ensure the EFI is released.
 	 */
-	if (error)
+	if (error && *committed == 0) {
+		*committed = 1;
 		return error;
+	}
 
 	efd = xfs_trans_get_efd(*tp, efi, flist->xbf_count);
 	for (free = flist->xbf_first; free != NULL; free = next) {
 		next = free->xbfi_next;
-		if ((error = xfs_free_extent(*tp, free->xbfi_startblock,
-				free->xbfi_blockcount))) {
+
+		/*
+		 * Free the extent if the above trans roll hasn't failed and log
+		 * the EFD before handling errors from either call to ensure the
+		 * EFI reference is accounted for in the tp. Otherwise, the EFI
+		 * is never released on abort and pins the AIL indefinitely.
+		 */
+		if (!error)
+			error = xfs_free_extent(*tp, free->xbfi_startblock,
+						free->xbfi_blockcount);
+		xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock,
+					 free->xbfi_blockcount);
+		if (error) {
 			/*
 			 * The bmap free list will be cleaned up at a
 			 * higher level.  The EFI will be canceled when
@@ -118,11 +134,10 @@ xfs_bmap_finish(
 						   SHUTDOWN_META_IO_ERROR);
 			return error;
 		}
-		xfs_trans_log_efd_extent(*tp, efd, free->xbfi_startblock,
-			free->xbfi_blockcount);
 		xfs_bmap_del_free(flist, NULL, free);
 	}
-	return 0;
+
+	return error;
 }
 
 int
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index adc8f8f..0e426ab 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -128,12 +128,12 @@ xfs_efi_item_pin(
 }
 
 /*
- * While EFIs cannot really be pinned, the unpin operation is the last place at
- * which the EFI is manipulated during a transaction.  If we are being asked to
- * remove the EFI it's because the transaction has been cancelled and by
- * definition that means the EFI cannot be in the AIL so remove it from the
- * transaction and free it.  Otherwise coordinate with xfs_efi_release()
- * to determine who gets to free the EFI.
+ * The unpin operation is the last place an EFI is manipulated in the log. It is
+ * either inserted in the AIL or aborted in the event of a log I/O error. In
+ * either case, the EFI transaction has been successfully committed to make it
+ * this far. Therefore, we expect whoever committed the EFI to construct the EFD
+ * and manage the EFD's reference to the EFI appropriately. Simply drop the
+ * log's EFI reference now that the log is done with it.
  */
 STATIC void
 xfs_efi_item_unpin(
@@ -141,14 +141,6 @@ xfs_efi_item_unpin(
 	int			remove)
 {
 	struct xfs_efi_log_item	*efip = EFI_ITEM(lip);
-
-	if (remove) {
-		ASSERT(!(lip->li_flags & XFS_LI_IN_AIL));
-		if (lip->li_desc)
-			xfs_trans_del_item(lip);
-		xfs_efi_item_free(efip);
-		return;
-	}
 	__xfs_efi_release(efip);
 }
 
@@ -167,6 +159,11 @@ xfs_efi_item_push(
 	return XFS_ITEM_PINNED;
 }
 
+/*
+ * The EFI has been either committed or aborted if the transaction has been
+ * cancelled. If the transaction was cancelled, an EFD isn't going to be
+ * constructed and thus we free the EFI here directly.
+ */
 STATIC void
 xfs_efi_item_unlock(
 	struct xfs_log_item	*lip)
@@ -415,20 +412,27 @@ xfs_efd_item_push(
 	return XFS_ITEM_PINNED;
 }
 
+/*
+ * The EFD is either committed or aborted if the transaction is cancelled. If
+ * the transaction is cancelled, drop our reference to the EFI and free the EFD.
+ */
 STATIC void
 xfs_efd_item_unlock(
 	struct xfs_log_item	*lip)
 {
-	if (lip->li_flags & XFS_LI_ABORTED)
-		xfs_efd_item_free(EFD_ITEM(lip));
+	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
+
+	if (lip->li_flags & XFS_LI_ABORTED) {
+		xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
+		xfs_efd_item_free(efdp);
+	}
 }
 
 /*
- * When the efd item is committed to disk, all we need to do
- * is delete our reference to our partner efi item and then
- * free ourselves.  Since we're freeing ourselves we must
- * return -1 to keep the transaction code from further referencing
- * this item.
+ * When the efd item is committed to disk, all we need to do is delete our
+ * reference to our partner efi item and then free ourselves. Since we're
+ * freeing ourselves we must return -1 to keep the transaction code from further
+ * referencing this item.
  */
 STATIC xfs_lsn_t
 xfs_efd_item_committed(
@@ -438,14 +442,15 @@ xfs_efd_item_committed(
 	struct xfs_efd_log_item	*efdp = EFD_ITEM(lip);
 
 	/*
-	 * If we got a log I/O error, it's always the case that the LR with the
-	 * EFI got unpinned and freed before the EFD got aborted.
+	 * Drop the efi reference regardless of whether this item has been
+	 * aborted. Once the EFI transaction is successfully committed, it is
+	 * the sole responsibility of the EFD to clean up the EFI (even if the
+	 * EFI is aborted due to log I/O error).
 	 */
-	if (!(lip->li_flags & XFS_LI_ABORTED))
-		xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
-
+	xfs_efi_release(efdp->efd_efip, efdp->efd_format.efd_nextents);
 	xfs_efd_item_free(efdp);
-	return (xfs_lsn_t)-1;
+
+	return (xfs_lsn_t) -1;
 }
 
 /*
-- 
2.1.0

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux