On Wed 22 May 2024 12:35:45 PM +02, Jan Kara wrote; > On Tue 21-05-24 16:45:34, Luis Henriques (SUSE) wrote: >> When a full journal commit is on-going, any fast commit has to be enqueued >> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing >> is done only once, i.e. if an inode is already queued in a previous fast >> commit entry it won't be enqueued again. However, if a full commit starts >> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to >> be done into FC_Q_STAGING. And this is not being done in function >> ext4_fc_track_template(). > > Ah, good catch. > >> This patch fixes the issue by simply re-enqueuing the inode from the MAIN >> into the STAGING queue. >> >> This bug was found using fstest generic/047. This test creates several 32k >> bytes files, sync'ing each of them after it's creation, and then shutting >> down the filesystem. Some data may be loss in this operation; for example a >> file may have it's size truncated to zero. >> >> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx> >> --- >> fs/ext4/fast_commit.c | 19 +++++++++++++------ >> 1 file changed, 13 insertions(+), 6 deletions(-) >> >> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c >> index 87c009e0c59a..337b5289cf11 100644 >> --- a/fs/ext4/fast_commit.c >> +++ b/fs/ext4/fast_commit.c >> @@ -396,12 +396,19 @@ static int ext4_fc_track_template( >> return ret; >> >> spin_lock(&sbi->s_fc_lock); >> - if (list_empty(&EXT4_I(inode)->i_fc_list)) >> - list_add_tail(&EXT4_I(inode)->i_fc_list, >> - (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING || >> - sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ? >> - &sbi->s_fc_q[FC_Q_STAGING] : >> - &sbi->s_fc_q[FC_Q_MAIN]); >> + if (sbi->s_journal->j_flags & JBD2_FULL_COMMIT_ONGOING || >> + sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) { >> + if (list_empty(&EXT4_I(inode)->i_fc_list)) >> + list_add_tail(&EXT4_I(inode)->i_fc_list, >> + &sbi->s_fc_q[FC_Q_STAGING]); >> + else >> + list_move_tail(&EXT4_I(inode)->i_fc_list, >> + &sbi->s_fc_q[FC_Q_STAGING]); > > So I'm not sure this is actually safe. I'm concerned about the following > race: > > Task1 Task2 > > handle = ext4_journal_start(..) > modify inode_X > ext4_fc_track_inode(inode_X) > ext4_fsync(inode_X) > ext4_fc_commit() > jbd2_fc_begin_commit() > journal->j_flags |= JBD2_FAST_COMMIT_ONGOING; > ... > jbd2_journal_lock_updates() > blocks waiting for handle of Task2 > modify inode_X > ext4_fc_track_inode(inode_X) > - moves inode out of FC_Q_MAIN > ext4_journal_stop() > fast commit proceeds but skips inode_X... Hmm... I see, the problem is deeper that I thought. > How we deal with a similar issue in jbd2 for ordinary buffers is that we > just mark the buffer as *also* belonging to the next transaction (by > setting jh->b_next_transaction) and during commit cleanup we move the bh to > the appropriate list of the next transaction. Here, we could mark the inode > as also being part of the next fast commit and during fastcommit cleanup we > could move it to FC_Q_STAGING which is then spliced back to FC_Q_MAIN. Yeah, I guess that would work. I'll need to add a new field to flag the 'next commit' in struct ext4_inode_info. I'll need to play a bit with it and see what I can came up with. Thanks for the suggestion. > Also Harshad has recently posted changes to fast commit code that modify > how fast commits are serialized (in particular jbd2_journal_lock_updates() > is gone). I didn't read them yet but your change definitely needs a careful > verification against those changes to make sure we don't introduce new data > integrity issues. > Right, I saw his patchset only after sending my RFC (and I should have probably included him on the CC as well; probably get_maintainer.pl isn't picking his email). I'll need to look at those changes too, which will probably take me some time as most of that code isn't familiar to me. Thanks a lot for your review, Jan. Much appreciated. Cheers, -- Luis >> + } else { >> + if (list_empty(&EXT4_I(inode)->i_fc_list)) >> + list_add_tail(&EXT4_I(inode)->i_fc_list, >> + &sbi->s_fc_q[FC_Q_MAIN]); >> + } >> spin_unlock(&sbi->s_fc_lock); >> >> return ret; > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR