The patch titled Subject: ocfs2: return error when we attempt to access a dirty bh in jbd2 has been added to the -mm tree. Its filename is ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch This patch should soon appear at http://ozlabs.org/~akpm/mmots/broken-out/ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch and later at http://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/SubmitChecklist when testing your code *** The -mm tree is included into linux-next and is updated there every 3-4 working days ------------------------------------------------------ From: piaojun <piaojun@xxxxxxxxxx> Subject: ocfs2: return error when we attempt to access a dirty bh in jbd2 We should not reuse the dirty bh in jbd2 directly due to the following situation: 1. When removing extent rec, we will dirty the bhs of extent rec and truncate log at the same time, and hand them over to jbd2. 2. The bhs are not flushed to disk due to abnormal storage link. 3. After a while the storage link become normal. Truncate log flush worker triggered by the next space reclaiming found the dirty bh of truncate log and clear its 'BH_Write_EIO' and then set it uptodate in __ocfs2_journal_access(): ocfs2_truncate_log_worker ocfs2_flush_truncate_log __ocfs2_flush_truncate_log ocfs2_replay_truncate_records ocfs2_journal_access_di __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata. 4. Then jbd2 will flush the bh of truncate log to disk, but the bh of extent rec is still in error state, and unfortunately nobody will take care of it. 5. At last the space of extent rec was not reduced, but truncate log flush worker have given it back to globalalloc. That will cause duplicate cluster problem which could be identified by fsck.ocfs2. So we should return -EIO in case of ruining atomicity and consistency of space reclaim. Link: http://lkml.kernel.org/r/5A6943BC.5010406@xxxxxxxxxx Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access") Signed-off-by: Jun Piao <piaojun@xxxxxxxxxx> Reviewed-by: Yiwen Jiang <jiangyiwen@xxxxxxxxxx> Cc: Mark Fasheh <mfasheh@xxxxxxxxxxx> Cc: Joel Becker <jlbec@xxxxxxxxxxxx> Cc: Junxiao Bi <junxiao.bi@xxxxxxxxxx> Cc: Joseph Qi <jiangqi903@xxxxxxxxx> Cc: Changwei Ge <ge.changwei@xxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- fs/ocfs2/journal.c | 45 +++++++++++++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 10 deletions(-) diff -puN fs/ocfs2/journal.c~ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2 fs/ocfs2/journal.c --- a/fs/ocfs2/journal.c~ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2 +++ a/fs/ocfs2/journal.c @@ -666,21 +666,46 @@ static int __ocfs2_journal_access(handle /* we can safely remove this assertion after testing. */ if (!buffer_uptodate(bh)) { mlog(ML_ERROR, "giving me a buffer that's not uptodate!\n"); - mlog(ML_ERROR, "b_blocknr=%llu\n", - (unsigned long long)bh->b_blocknr); + mlog(ML_ERROR, "b_blocknr=%llu, b_state=0x%lx\n", + (unsigned long long)bh->b_blocknr, bh->b_state); lock_buffer(bh); /* - * A previous attempt to write this buffer head failed. - * Nothing we can do but to retry the write and hope for - * the best. + * We should not reuse the dirty bh directly due to the + * following situation: + * + * 1. When removing extent rec, we will dirty the bhs of + * extent rec and truncate log at the same time, and + * hand them over to jbd2. + * 2. The bhs are not flushed to disk due to abnormal + * storage link. + * 3. After a while the storage link become normal. + * Truncate log flush worker triggered by the next + * space reclaiming found the dirty bh of truncate log + * and clear its 'BH_Write_EIO' and then set it uptodate + * in __ocfs2_journal_access(): + * + * ocfs2_truncate_log_worker + * ocfs2_flush_truncate_log + * __ocfs2_flush_truncate_log + * ocfs2_replay_truncate_records + * ocfs2_journal_access_di + * __ocfs2_journal_access + * + * 4. Then jbd2 will flush the bh of truncate log to disk, + * but the bh of extent rec is still in error state, and + * unfortunately nobody will take care of it. + * 5. At last the space of extent rec was not reduced, + * but truncate log flush worker have given it back to + * globalalloc. That will cause duplicate cluster problem + * which could be identified by fsck.ocfs2. + * + * So we should return -EIO in case of ruining atomicity + * and consistency of space reclaim. */ if (buffer_write_io_error(bh) && !buffer_uptodate(bh)) { - clear_buffer_write_io_error(bh); - set_buffer_uptodate(bh); - } - - if (!buffer_uptodate(bh)) { + mlog(ML_ERROR, "A previous attempt to write this " + "buffer head failed\n"); unlock_buffer(bh); return -EIO; } _ Patches currently in -mm which might be from piaojun@xxxxxxxxxx are ocfs2-return-erofs-to-mountocfs2-if-inode-block-is-invalid.patch ocfs2-xattr-assign-errno-to-ret-in-ocfs2_calc_xattr_init.patch ocfs2-acl-use-ip_xattr_sem-to-protect-getting-extended-attribute.patch ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch -- To unsubscribe from this list: send the line "unsubscribe mm-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html