Hi all, Here's a first real stab at a fix for the log tail overwrite issue. The general approach is similar to torn write detection: move the tail forward when corruption is detected within the range of a possible tail overwrite. Patch 1 fixes an independent and spurious log recovery failure when a log record header wraps around the end of the physical log. Patch 2 is a semi-preparatory patch that unconditionally invokes log tail verification rather than only after torn write detection at the head. Patch 3 introduces the core fix to move the tail forward in the event of corruption. Patch 4 introduces an error injection tag to force log item pinning and facilitates the test that reliably reproduces the tail overwrite problem. This survives the latest variant of the xfstests test that reproduces the tail overwrite condition and otherwise hasn't shown any regressions in my testing so far (still ongoing). This also allows the metadump images provided by Sweet Tea[1] to mount (though those images do still show filesystem corruption after mount, so I suspect something more is going on there). One other slight change worth noting in log recovery behavior is that tail overwrite detection causes earlier reporting of legitimate log CRC or corruption errors. Before this series, a log corruption that is not resolved by torn write/tail overwrite detection results in log recovery failure after a partial recovery up to the point at which the corruption is encountered. After this series, it is very likely that the corruption is identified during tail verification and an error returned to userspace before real recovery begins. An xfs_repair is necessary in either case, but I'm curious if there is a preference towards the old or newly proposed behavior. An alternative I've considered to preserve the old behavior, for example, is to use the tail verification CRC pass for tail fixing only (and otherwise consider errors at this point as nonfatal). This means that we would fix up the tail if possible, otherwise leave errors to the real recovery sequence such that a partial recovery can occur before the (imminent) failure. Thoughts? Brian v1: - Add patch to fix log recovery header wrapping problem. - Replace transaction reservation rfc with log recovery based fix. - Replace custom log pinning sysfs knob with error injection tag. rfc: http://www.spinics.net/lists/linux-xfs/msg07623.html [1] http://www.spinics.net/lists/linux-xfs/msg07667.html Brian Foster (4): xfs: fix recovery failure when log record header wraps log end xfs: always verify the log tail during recovery xfs: fix log recovery corruption error due to tail overwrite xfs: add log item pinning error injection tag fs/xfs/xfs_error.c | 3 + fs/xfs/xfs_error.h | 4 +- fs/xfs/xfs_log_recover.c | 150 +++++++++++++++++++++++++++++------------------ fs/xfs/xfs_trans_ail.c | 17 +++++- 4 files changed, 114 insertions(+), 60 deletions(-) -- 2.7.5 -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html