+ ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch added to -mm tree

akpm@xxxxxxxxxxxxxxxxxxxx · Mon, 29 Jan 2018 15:15:32 -0800

The patch titled
     Subject: ocfs2: return error when we attempt to access a dirty bh in jbd2
has been added to the -mm tree.  Its filename is
     ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/SubmitChecklist when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: piaojun <piaojun@xxxxxxxxxx>
Subject: ocfs2: return error when we attempt to access a dirty bh in jbd2

We should not reuse the dirty bh in jbd2 directly due to the following
situation:

1. When removing extent rec, we will dirty the bhs of extent rec and
   truncate log at the same time, and hand them over to jbd2.
2. The bhs are submitted to jbd2 area successfully.
3. The write-back thread of device help flush the bhs to disk but
   encounter write error due to abnormal storage link.
4. After a while the storage link become normal. Truncate log flush
   worker triggered by the next space reclaiming found the dirty bh of
   truncate log and clear its 'BH_Write_EIO' and then set it uptodate in
   __ocfs2_journal_access():

ocfs2_truncate_log_worker
  ocfs2_flush_truncate_log
    __ocfs2_flush_truncate_log
      ocfs2_replay_truncate_records
        ocfs2_journal_access_di
          __ocfs2_journal_access // here we clear io_error and set 'tl_bh' uptodata.

5. Then jbd2 will flush the bh of truncate log to disk, but the bh of
   extent rec is still in error state, and unfortunately nobody will
   take care of it.
6. At last the space of extent rec was not reduced, but truncate log
   flush worker have given it back to globalalloc. That will cause
   duplicate cluster problem which could be identified by fsck.ocfs2.

Sadlly we can hardly revert this but set fs read-only in case of
ruining atomicity and consistency of space reclaim.

Link: http://lkml.kernel.org/r/5A6E8092.8090701@xxxxxxxxxx
Fixes: acf8fdbe6afb ("ocfs2: do not BUG if buffer not uptodate in __ocfs2_journal_access")
Signed-off-by: Jun Piao <piaojun@xxxxxxxxxx>
Reviewed-by: Yiwen Jiang <jiangyiwen@xxxxxxxxxx>
Reviewed-by: Changwei Ge <ge.changwei@xxxxxxx>
Cc: Mark Fasheh <mfasheh@xxxxxxxxxxx>
Cc: Joel Becker <jlbec@xxxxxxxxxxxx>
Cc: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
Cc: Joseph Qi <jiangqi903@xxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 fs/ocfs2/journal.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff -puN fs/ocfs2/journal.c~ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2 fs/ocfs2/journal.c

--- a/fs/ocfs2/journal.c~ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2
+++ a/fs/ocfs2/journal.c
@@ -666,23 +666,24 @@ static int __ocfs2_journal_access(handle
 	/* we can safely remove this assertion after testing. */
 	if (!buffer_uptodate(bh)) {
 		mlog(ML_ERROR, "giving me a buffer that's not uptodate!\n");
-		mlog(ML_ERROR, "b_blocknr=%llu\n",
-		     (unsigned long long)bh->b_blocknr);
+		mlog(ML_ERROR, "b_blocknr=%llu, b_state=0x%lx\n",
+		     (unsigned long long)bh->b_blocknr, bh->b_state);
 
 		lock_buffer(bh);
 		/*
-		 * A previous attempt to write this buffer head failed.
-		 * Nothing we can do but to retry the write and hope for
-		 * the best.
+		 * A previous transaction with a couple of buffer heads fail
+		 * to checkpoint, so all the bhs are marked as BH_Write_EIO.
+		 * For current transaction, the bh is just among those error
+		 * bhs which previous transaction handle. We can't just clear
+		 * its BH_Write_EIO and reuse directly, since other bhs are
+		 * not written to disk yet and that will cause metadata
+		 * inconsistency. So we should set fs read-only to avoid
+		 * further damage.
 		 */
 		if (buffer_write_io_error(bh) && !buffer_uptodate(bh)) {
-			clear_buffer_write_io_error(bh);
-			set_buffer_uptodate(bh);
-		}
-
-		if (!buffer_uptodate(bh)) {
 			unlock_buffer(bh);
-			return -EIO;
+			return ocfs2_error(osb->sb, "A previous attempt to "
+					"write this buffer head failed\n");
 		}
 		unlock_buffer(bh);
 	}
_

Patches currently in -mm which might be from piaojun@xxxxxxxxxx are

ocfs2-return-erofs-to-mountocfs2-if-inode-block-is-invalid.patch
ocfs2-xattr-assign-errno-to-ret-in-ocfs2_calc_xattr_init.patch
ocfs2-acl-use-ip_xattr_sem-to-protect-getting-extended-attribute.patch
ocfs2-return-error-when-we-attempt-to-access-a-dirty-bh-in-jbd2.patch

--
To unsubscribe from this list: send the line "unsubscribe mm-commits" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html