[PATCH] xfs: serialize iclog write of xlog_cil_push

Mark Tinguely <tinguely@xxxxxxx> · Sat, 05 Jan 2013 14:34:15 -0600

The back-end of xlog_cil_push() allows multiple push sequences 
to write to the xlog at the same time. This will cause problems
for recovery and also could cause the xlog_cil_committed() callback
to be called out of sequence.

This was discovered with an EFI/EFD misorder. There are several (5) active
sequence pushes and 3 completed pushes. The callback for the sequence (2)
holding the EFD is being processed but the callback for the sequence (1)
holding the EFI has not yet been processed.

The xlog_cil_committed() callback misorder happens because the buffer that
contains the sequence ticket is filled by another sequence push and the
callback for the buffer write happens before the callback is placed onto
that buffer.

This patch serializes the xlog_write() so that only one sequence (the lowest)
is written at a time. This will also stop the race between xlog_commit_record()
and the adding of the callback onto the buffer containing the sequence commit
record.
 
---
 fs/xfs/xfs_log_cil.c |   44 +++++++++++++++++++-------------------------
 1 file changed, 19 insertions(+), 25 deletions(-)

Index: b/fs/xfs/xfs_log_cil.c
===================================================================

--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -390,6 +390,7 @@ xlog_cil_push(
 	struct xfs_log_vec	lvhdr = { NULL };
 	xfs_lsn_t		commit_lsn;
 	xfs_lsn_t		push_seq;
+	xfs_lsn_t		lowest_seq;
 
 	if (!cil)
 		return 0;
@@ -495,6 +496,24 @@ xlog_cil_push(
 	up_write(&cil->xc_ctx_lock);
 
 	/*
+	 * write the entried and checkpoint into the log. Strictly order
+	 * the commit records so replay will get them in the right order.
+	 */
+restart:
+	spin_lock(&cil->xc_cil_lock);
+	lowest_seq = ctx->sequence;
+	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
+		/* find the lowest sequence not yet committed */
+		if (lowest_seq > new_ctx->sequence && !new_ctx->commit_lsn)
+			lowest_seq = new_ctx->sequence;
+	}
+	if (ctx->sequence != lowest_seq) {
+		xlog_wait(&cil->xc_commit_wait, &cil->xc_cil_lock);
+		goto restart;
+	}
+	spin_unlock(&cil->xc_cil_lock);
+
+	/*
 	 * Build a checkpoint transaction header and write it to the log to
 	 * begin the transaction. We need to account for the space used by the
 	 * transaction header here as it is not accounted for in xlog_write().
@@ -521,30 +540,6 @@ xlog_cil_push(
 	if (error)
 		goto out_abort_free_ticket;
 
-	/*
-	 * now that we've written the checkpoint into the log, strictly
-	 * order the commit records so replay will get them in the right order.
-	 */
-restart:
-	spin_lock(&cil->xc_cil_lock);
-	list_for_each_entry(new_ctx, &cil->xc_committing, committing) {
-		/*
-		 * Higher sequences will wait for this one so skip them.
-		 * Don't wait for own own sequence, either.
-		 */
-		if (new_ctx->sequence >= ctx->sequence)
-			continue;
-		if (!new_ctx->commit_lsn) {
-			/*
-			 * It is still being pushed! Wait for the push to
-			 * complete, then start again from the beginning.
-			 */
-			xlog_wait(&cil->xc_commit_wait, &cil->xc_cil_lock);
-			goto restart;
-		}
-	}
-	spin_unlock(&cil->xc_cil_lock);
-
 	/* xfs_log_done always frees the ticket on error. */
 	commit_lsn = xfs_log_done(log->l_mp, tic, &commit_iclog, 0);
 	if (commit_lsn == -1)
@@ -556,7 +551,6 @@ restart:
 	error = xfs_log_notify(log->l_mp, commit_iclog, &ctx->log_cb);
 	if (error)
 		goto out_abort;
-
 	/*
 	 * now the checkpoint commit is complete and we've attached the
 	 * callbacks to the iclog we can assign the commit LSN to the context


_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs