On 08/28/13 06:22, Dave Chinner wrote:
From: Dave Chinner<dchinner@xxxxxxxxxx> Log recovery has some strict ordering requirements which unordered or reordered metadata writeback can defeat. This can occur when an item is logged in a transaction, written back to disk, and then logged in a new transaction before the tail of the log is moved past the original modification. The result of this is that when we read an object off disk for recovery purposes, the buffer that we read may not contain the object type that recovery is expecting and hence at the end of the checkpoint being recovered we have an invalid object in memory. This isn't usually a problem, as recovery will then replay all the other checkpoints and that brings the object back to a valid and correct state, but the issue is that while the object is in the invalid state it can be flushed to disk. This results in the object verifier failing and triggering a corruption shutdown of log recover. This is correct behaviour for the verifiers - the problem is that we are not detecting that the object we've read off disk is newer than the transaction we are replaying. All metadata in v5 filesystems has the LSN of it's last modification stamped in it. This enabled log recover to read that field and determine the age of the object on disk correctly. If the LSN of the object on disk is older than the transaction being replayed, then we replay the modification. If the LSN of the object matches or is more recent than the transaction's LSN, then we should avoid overwriting the object as that is what leads to the transient corrupt state. Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx> ---
@@ -2488,7 +2595,7 @@ xlog_recover_buffer_pass2( xlog_recover_do_reg_buffer(mp, item, bp, buf_f); } if (error) - return XFS_ERROR(error); + goto out_release;
This adds a xfs_buf_relse() on the buffer in the error path. The reference was taken in this routine. The callers do not know of the buffer and can't release it. convinced me.
/* * Perform delayed write on the buffer. Asynchronous writes will be @@ -2517,6 +2624,7 @@ xlog_recover_buffer_pass2( xfs_buf_delwri_queue(bp, buffer_list); } +out_release: xfs_buf_relse(bp); return error;
Looks good. Nice to get into Linux 3.12 and possibly back to stable. Reviewed-by: Mark Tinguely <tinguely@xxxxxxx> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs