On Tue, Jul 16 2024, Jan Kara wrote: > On Thu 11-07-24 09:35:20, Luis Henriques (SUSE) wrote: >> When a full journal commit is on-going, any fast commit has to be enqueued >> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing >> is done only once, i.e. if an inode is already queued in a previous fast >> commit entry it won't be enqueued again. However, if a full commit starts >> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to >> be done into FC_Q_STAGING. And this is not being done in function >> ext4_fc_track_template(). >> >> This patch fixes the issue by re-enqueuing an inode into the STAGING queue >> during the fast commit clean-up callback if it has a tid (i_sync_tid) >> greater than the one being handled. The STAGING queue will then be spliced >> back into MAIN. >> >> This bug was found using fstest generic/047. This test creates several 32k >> bytes files, sync'ing each of them after it's creation, and then shutting >> down the filesystem. Some data may be loss in this operation; for example a >> file may have it's size truncated to zero. >> >> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx> > > ... > >> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c >> index 3926a05eceee..facbc8dbbaa2 100644 >> --- a/fs/ext4/fast_commit.c >> +++ b/fs/ext4/fast_commit.c >> @@ -1290,6 +1290,16 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid) >> EXT4_STATE_FC_COMMITTING); >> if (tid_geq(tid, iter->i_sync_tid)) >> ext4_fc_reset_inode(&iter->vfs_inode); >> + } else if (tid) { >> + /* >> + * If the tid is valid (i.e. non-zero) re-enqueue the >> + * inode into STAGING, which will then be splice back >> + * into MAIN >> + */ >> + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list, >> + &sbi->s_fc_q[FC_Q_STAGING]); >> + } > > I don't think this is going to work (even if we fix the tid 0 being special > assumption). With this there would be a race like: > > Task 1 Task2 > modify inode I > ext4_fc_commit() > jbd2_fc_begin_commit() > commits changes > jbd2_fc_end_commit() > __jbd2_fc_end_commit(journal, 0, false) > jbd2_journal_unlock_updates(journal) > jbd2_journal_start() > modify inode I > ... > ext4_mark_iloc_dirty() > ext4_fc_track_inode() > ext4_fc_track_template() > - doesn't add inode anywhere > because i_fc_list is not empty > ext4_fc_cleanup(journal, 0, 0) > removes inode I from i_fc_list => next fastcommit will not properly > flush it. > > To avoid this race I think we could move the > journal->j_fc_cleanup_callback() call to happen before we call > jbd2_journal_unlock_updates(). Then we are sure that inode cannot be > modified (journal is locked) until we are done processing the fastcommit > lists when doing fastcommit. Hence your patch could then be changed like: > > + } else if (full) { > + /* > + * We are called after a full commit, inode has been > + * modified while the commit was running. Re-enqueue > + * the inode into STAGING, which will then be splice > + * back into MAIN. This cannot happen during > + * fastcommit because the journal is locked all the > + * time in that case (and tid doesn't increase so > + * tid check above isn't reliable). > + */ > + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list, > + &sbi->s_fc_q[FC_Q_STAGING]); > + } > > Later, Harshad's patches change the code to use EXT4_STATE_FC_COMMITTING > for protecting inodes during fastcommit and that will also deal with these > races without having to keep the whole journal locked. OK, this looks like it should fix all the issues I was trying to fix (g/047, g/472, and a few others Ted pointed out). I'll go run a few more tests on this to try to catch any possible regression. Once again, thanks a lot for your help, Jan. Cheers, -- Luís