The back-end of xlog_cil_push() allows multiple push sequences to write to the xlog at the same time. This will cause problems for recovery and also could cause the xlog_cil_committed() callback to be called out of sequence. This was discovered with an EFI/EFD misorder. There are several (5) active sequence pushes and 3 completed pushes. The callback for the sequence (2) holding the EFD is being processed but the callback for the sequence (1) holding the EFI has not yet been processed. The xlog_cil_committed() callback misorder happens because the buffer that contains the sequence ticket is filled by another sequence push and the callback for the buffer write happens before the callback is placed onto that buffer. This patch serializes the xlog_write() so that only one sequence (the lowest) is written at a time. This will also stop the race between xlog_commit_record() and the adding of the callback onto the buffer containing the sequence commit record. --- fs/xfs/xfs_log_cil.c | 44 +++++++++++++++++++------------------------- 1 file changed, 19 insertions(+), 25 deletions(-) Index: b/fs/xfs/xfs_log_cil.c =================================================================== --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -390,6 +390,7 @@ xlog_cil_push( struct xfs_log_vec lvhdr = { NULL }; xfs_lsn_t commit_lsn; xfs_lsn_t push_seq; + xfs_lsn_t lowest_seq; if (!cil) return 0; @@ -495,6 +496,24 @@ xlog_cil_push( up_write(&cil->xc_ctx_lock); /* + * write the entried and checkpoint into the log. Strictly order + * the commit records so replay will get them in the right order. + */ +restart: + spin_lock(&cil->xc_cil_lock); + lowest_seq = ctx->sequence; + list_for_each_entry(new_ctx, &cil->xc_committing, committing) { + /* find the lowest sequence not yet committed */ + if (lowest_seq > new_ctx->sequence && !new_ctx->commit_lsn) + lowest_seq = new_ctx->sequence; + } + if (ctx->sequence != lowest_seq) { + xlog_wait(&cil->xc_commit_wait, &cil->xc_cil_lock); + goto restart; + } + spin_unlock(&cil->xc_cil_lock); + + /* * Build a checkpoint transaction header and write it to the log to * begin the transaction. We need to account for the space used by the * transaction header here as it is not accounted for in xlog_write(). @@ -521,30 +540,6 @@ xlog_cil_push( if (error) goto out_abort_free_ticket; - /* - * now that we've written the checkpoint into the log, strictly - * order the commit records so replay will get them in the right order. - */ -restart: - spin_lock(&cil->xc_cil_lock); - list_for_each_entry(new_ctx, &cil->xc_committing, committing) { - /* - * Higher sequences will wait for this one so skip them. - * Don't wait for own own sequence, either. - */ - if (new_ctx->sequence >= ctx->sequence) - continue; - if (!new_ctx->commit_lsn) { - /* - * It is still being pushed! Wait for the push to - * complete, then start again from the beginning. - */ - xlog_wait(&cil->xc_commit_wait, &cil->xc_cil_lock); - goto restart; - } - } - spin_unlock(&cil->xc_cil_lock); - /* xfs_log_done always frees the ticket on error. */ commit_lsn = xfs_log_done(log->l_mp, tic, &commit_iclog, 0); if (commit_lsn == -1) @@ -556,7 +551,6 @@ restart: error = xfs_log_notify(log->l_mp, commit_iclog, &ctx->log_cb); if (error) goto out_abort; - /* * now the checkpoint commit is complete and we've attached the * callbacks to the iclog we can assign the commit LSN to the context _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs