We have seen somewhat rare reports of the following assert from xlog_cil_push_background() failing during ltp tests or somewhat innocuous desktop root fs workloads (e.g., virt operations, initramfs construction): ASSERT(!list_empty(&cil->xc_cil)); The reasoning behind the assert is that the transaction has inserted items to the CIL and hit background push codepath all with cil->xc_ctx_lock held for reading. This locks out background commit from emptying the CIL, which acquires the lock for writing. Therefore, the reasoning is that the items previously inserted in the CIL should still be present. The cil->xc_ctx_lock read lock is not sufficient to protect the xc_cil list, however, due to how CIL insertion is handled. xlog_cil_insert_items() inserts and reorders the dirty transaction items to the tail of the CIL under xc_cil_lock. It uses list_move_tail() to achieve insertion and reordering in the same block of code. This function removes and reinserts an item to the tail of the list. If a transaction commits an item that was already logged and thus already resides in the CIL, and said item is the sole item on the list, the removal and reinsertion creates a temporary state where the list is actually empty. This state is not valid and thus should never be observed by concurrent transaction commit-side checks in the circumstances outlined above. Update all of the xc_cil checks to acquire xc_cil_lock before assessing the state of xc_cil. Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> --- fs/xfs/xfs_log_cil.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c index abc2ccb..324d449 100644 --- a/fs/xfs/xfs_log_cil.c +++ b/fs/xfs/xfs_log_cil.c @@ -687,7 +687,7 @@ xlog_cil_push_background( * The cil won't be empty because we are called while holding the * context lock so whatever we added to the CIL will still be there */ - ASSERT(!list_empty(&cil->xc_cil)); + ASSERT(!xlog_cil_empty(log)); /* * don't do a background push if we haven't used up all the @@ -731,13 +731,17 @@ xlog_cil_push_now( * there's no work we need to do. */ spin_lock(&cil->xc_push_lock); + spin_lock(&cil->xc_cil_lock); if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) { + spin_unlock(&cil->xc_cil_lock); spin_unlock(&cil->xc_push_lock); return; } + spin_unlock(&cil->xc_cil_lock); cil->xc_push_seq = push_seq; queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work); + spin_unlock(&cil->xc_push_lock); } @@ -749,8 +753,10 @@ xlog_cil_empty( bool empty = false; spin_lock(&cil->xc_push_lock); + spin_lock(&cil->xc_cil_lock); if (list_empty(&cil->xc_cil)) empty = true; + spin_unlock(&cil->xc_cil_lock); spin_unlock(&cil->xc_push_lock); return empty; } @@ -887,12 +893,15 @@ restart: * it means we haven't yet started the push, because if it had started * we would have found the context on the committing list. */ + spin_lock(&cil->xc_cil_lock); if (sequence == cil->xc_current_sequence && !list_empty(&cil->xc_cil)) { + spin_unlock(&cil->xc_cil_lock); spin_unlock(&cil->xc_push_lock); goto restart; } + spin_unlock(&cil->xc_cil_lock); spin_unlock(&cil->xc_push_lock); return commit_lsn; -- 1.9.3 _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs