Re: [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit

Jan Kara <jack@xxxxxxx> · Tue, 16 Jul 2024 12:24:16 +0200

On Thu 11-07-24 09:35:20, Luis Henriques (SUSE) wrote:
> When a full journal commit is on-going, any fast commit has to be enqueued
> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
> is done only once, i.e. if an inode is already queued in a previous fast
> commit entry it won't be enqueued again.  However, if a full commit starts
> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
> be done into FC_Q_STAGING.  And this is not being done in function
> ext4_fc_track_template().
> 
> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
> during the fast commit clean-up callback if it has a tid (i_sync_tid)
> greater than the one being handled.  The STAGING queue will then be spliced
> back into MAIN.
> 
> This bug was found using fstest generic/047.  This test creates several 32k
> bytes files, sync'ing each of them after it's creation, and then shutting
> down the filesystem.  Some data may be loss in this operation; for example a
> file may have it's size truncated to zero.
> 
> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx>

...

> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> index 3926a05eceee..facbc8dbbaa2 100644
> --- a/fs/ext4/fast_commit.c
> +++ b/fs/ext4/fast_commit.c
> @@ -1290,6 +1290,16 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>  				       EXT4_STATE_FC_COMMITTING);
>  		if (tid_geq(tid, iter->i_sync_tid))
>  			ext4_fc_reset_inode(&iter->vfs_inode);
> +		} else if (tid) {
> +			/*
> +			 * If the tid is valid (i.e. non-zero) re-enqueue the
> +			 * inode into STAGING, which will then be splice back
> +			 * into MAIN
> +			 */
> +			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
> +				      &sbi->s_fc_q[FC_Q_STAGING]);
> +		}

I don't think this is going to work (even if we fix the tid 0 being special
assumption). With this there would be a race like:

Task 1					Task2
modify inode I
ext4_fc_commit()
  jbd2_fc_begin_commit()
  commits changes
  jbd2_fc_end_commit()
    __jbd2_fc_end_commit(journal, 0, false)
      jbd2_journal_unlock_updates(journal)
					jbd2_journal_start()
					modify inode I
					...
					ext4_mark_iloc_dirty()
					  ext4_fc_track_inode()
					    ext4_fc_track_template()
					      - doesn't add inode anywhere
					      because i_fc_list is not empty
      ext4_fc_cleanup(journal, 0, 0)
        removes inode I from i_fc_list => next fastcommit will not properly
flush it.

To avoid this race I think we could move the
journal->j_fc_cleanup_callback() call to happen before we call
jbd2_journal_unlock_updates(). Then we are sure that inode cannot be
modified (journal is locked) until we are done processing the fastcommit
lists when doing fastcommit. Hence your patch could then be changed like:

+		} else if (full) {
+			/*
+			 * We are called after a full commit, inode has been
+			 * modified while the commit was running. Re-enqueue
+			 * the inode into STAGING, which will then be splice
+			 * back into MAIN. This cannot happen during
+			 * fastcommit because the journal is locked all the
+			 * time in that case (and tid doesn't increase so
+			 * tid check above isn't reliable).
+			 */
+			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
+				      &sbi->s_fc_q[FC_Q_STAGING]);
+		}

Later, Harshad's patches change the code to use EXT4_STATE_FC_COMMITTING
for protecting inodes during fastcommit and that will also deal with these
races without having to keep the whole journal locked.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR