Re: [PATCH v2] jbd2: Fix data missing when reusing bh which is ready to be checkpointed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



在 2023/1/6 22:22, Jan Kara 写道:

Hi Jan, thanks for reviewing.Some discussions below:


diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 6a404ac1c178..06a5e7961ef2 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1010,36 +1010,37 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
  	 * ie. locked but not dirty) or tune2fs (which may actually have
  	 * the buffer dirtied, ugh.)  */
- if (buffer_dirty(bh)) {
+	if (buffer_dirty(bh) && jh->b_transaction) {
  		/*
  		 * First question: is this buffer already part of the current
  		 * transaction or the existing committing transaction?
  		 */
-		if (jh->b_transaction) {
-			J_ASSERT_JH(jh,
-				jh->b_transaction == transaction ||
-				jh->b_transaction ==
-					journal->j_committing_transaction);
-			if (jh->b_next_transaction)
-				J_ASSERT_JH(jh, jh->b_next_transaction ==
-							transaction);
-			warn_dirty_buffer(bh);
-		}
+		J_ASSERT_JH(jh, jh->b_transaction == transaction ||
+			jh->b_transaction == journal->j_committing_transaction);
+		if (jh->b_next_transaction)
+			J_ASSERT_JH(jh, jh->b_next_transaction == transaction);
+		warn_dirty_buffer(bh);
  		/*
-		 * In any case we need to clean the dirty flag and we must
-		 * do it under the buffer lock to be sure we don't race
-		 * with running write-out.
+		 * We need to clean the dirty flag and we must do it under the
+		 * buffer lock to be sure we don't race with running write-out.
  		 */
  		JBUFFER_TRACE(jh, "Journalling dirty buffer");
  		clear_buffer_dirty(bh);
+		/*
+		 * Setting jbddirty after clearing buffer dirty is necessary.
+		 * Function jbd2_journal_restart() could keep buffer on
+		 * BJ_Reserved list until the transaction committing, then the
+		 * buffer won't be dirtied by jbd2_journal_refile_buffer()
+		 * after committing, the buffer couldn't fall on disk even
+		 * last checkpoint finished, which may corrupt filesystem.
+		 */
  		set_buffer_jbddirty(bh);
  	}

So I think the sequence:

	if (buffer_dirty(bh)) {
		warn_dirty_buffer(bh);
		JBUFFER_TRACE(jh, "Journalling dirty buffer");
		clear_buffer_dirty(bh);
		set_buffer_jbddirty(bh);
	}

can be moved into the branch

   	if (jh->b_transaction == transaction ||
	    jh->b_next_transaction == transaction) {

below. That way you can drop the assertions as well because they happen
later in do_get_write_access() again anyway.

1. If we move the squence:
 	if (buffer_dirty(bh)) {
 		warn_dirty_buffer(bh);
 		JBUFFER_TRACE(jh, "Journalling dirty buffer");
 		clear_buffer_dirty(bh);
 		set_buffer_jbddirty(bh);
 	}

into the branch

        if (jh->b_transaction == transaction ||
 	    jh->b_next_transaction == transaction) {

, then we have a new situation(jh->b_transaction == journal->j_committing_transaction) to clear buffer dirty, so we need to add an else-branch like(based on v2 patch):
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1092,6 +1092,10 @@ do_get_write_access(handle_t *handle, struct journal_head *jh,
                spin_unlock(&journal->j_list_lock);
                unlock_buffer(bh);
                goto done;
+       } else if (test_clear_buffer_dirty(bh)) {
+               warn_dirty_buffer(bh);
+               JBUFFER_TRACE(jh, "Journalling dirty buffer");
+               set_buffer_jbddirty(bh);
        }
        unlock_buffer(bh);

I think we'd better not to move the sequence?

2. I agree that the assertions in branch 'if (jh->b_transaction)' are redundant, I will remove them in v3. Thanks for pointing that.

Also I don't quite understand the new comment you have added. Do you mean
we need to not only clear BH_Dirty bit but also set BH_JBDdirty as dirtying
(through jbd2_journal_dirty_metadata()) does not have to follow after
do_get_write_access()?


Yes.
I think one reason we have non-empty commit_transaction->t_reserved_list is that: jbd2_journal_restart() could let jh attach to transaction_1 and dirty jh in transaction_2.

buffer is dirty after trans_0 committed
do_get_write_access() => jh->trans = old_handle->trans_1, clear buffer dirty & set jbddirty, BJ_Reserved jbd2_journal_restart() => stop old_handle && may jbd2_log_start_commit && start new_handle with trans_2 jbd2_journal_commit_transaction() => clear jbddirty & set buffer dirty & set jh->b_transaction = NULL do_checkpoint => buffer is fell on disk. If do_get_write_access() not mark jbddirty, buffer won't be fell on disk after checkpoint, which could corrupt filesystem.

I'm not sure whether we have the above path in realworld. I guess it exists in theory according to the comments: /* * First thing we are allowed to do is to discard any remaining * BJ_Reserved buffers. Note, it is _not_ permissible to assume * that there are no such buffers: if a large filesystem * operation like a truncate needs to split itself over multiple * transactions, then it may try to do a jbd2_journal_restart() while * there are still BJ_Reserved buffers outstanding. These must * be released cleanly from the current transaction. * * In this case, the filesystem must still reserve write access * again before modifying the buffer in the new transaction, but * we do not require it to remember exactly which old buffers it * has reserved. This is consistent with the existing behaviour * that multiple jbd2_journal_get_write_access() calls to the same * buffer are perfectly permissible. * We use journal->j_state_lock here to serialize processing of * t_reserved_list with eviction of buffers from journal_unmap_buffer(). */
        while (commit_transaction->t_reserved_list) {  [...]



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux