When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by flagging an inode that is already enqueued in either queues. Later, during the fast commit clean-up callback, if the inode has a tid that is bigger than the one being handled, that inode is re-enqueued into STAGING and the spliced back into MAIN. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx> --- Hi! (Now Cc'ing Harshad, as I should have done in the initial RFC.) This v2 is a complete different solution, hinted by Jan Kara. I hope my understanding of his suggestion is correct. Also, I've dropped the second patch as it didn't made sense, as Jan also pointed out. Finally, I haven't yet done a review of Harshad's patchset [1] (hope to get to it soon), but a quick test shows the issue is still present there. The good news is that patch can be trivially applied on top of it. [1] https://lore.kernel.org/all/20240520055153.136091-1-harshadshirwadkar@xxxxxxxxx Cheers, -- Luis fs/ext4/ext4.h | 11 ++++++++++- fs/ext4/fast_commit.c | 11 +++++++++++ fs/ext4/super.c | 1 + 3 files changed, 22 insertions(+), 1 deletion(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index 983dad8c07ec..4c308c18c3da 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -1062,9 +1062,18 @@ struct ext4_inode_info { /* Fast commit wait queue for this inode */ wait_queue_head_t i_fc_wait; - /* Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len */ + /* + * Protect concurrent accesses on i_fc_lblk_start, i_fc_lblk_len, + * i_fc_next + */ struct mutex i_fc_lock; + /* + * Used to flag an inode as part of the next fast commit; will be + * reset during fast commit clean-up + */ + tid_t i_fc_next; + /* * i_disksize keeps track of what the inode size is ON DISK, not * in memory. During truncate, i_size is set to the new size by diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index 87c009e0c59a..bfdf249f0783 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -402,6 +402,8 @@ static int ext4_fc_track_template( sbi->s_journal->j_flags & JBD2_FAST_COMMIT_ONGOING) ? &sbi->s_fc_q[FC_Q_STAGING] : &sbi->s_fc_q[FC_Q_MAIN]); + else + ei->i_fc_next = tid; spin_unlock(&sbi->s_fc_lock); return ret; @@ -1280,6 +1282,15 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid) list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN], i_fc_list) { list_del_init(&iter->i_fc_list); + if (iter->i_fc_next == tid) + iter->i_fc_next = 0; + else if (iter->i_fc_next > tid) + /* + * re-enqueue inode into STAGING, which will later be + * splice back into MAIN + */ + list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list, + &sbi->s_fc_q[FC_Q_STAGING]); ext4_clear_inode_state(&iter->vfs_inode, EXT4_STATE_FC_COMMITTING); if (iter->i_sync_tid <= tid) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 893ab80dafba..56f416656d96 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1437,6 +1437,7 @@ static struct inode *ext4_alloc_inode(struct super_block *sb) INIT_WORK(&ei->i_rsv_conversion_work, ext4_end_io_rsv_work); ext4_fc_init_inode(&ei->vfs_inode); mutex_init(&ei->i_fc_lock); + ei->i_fc_next = 0; return &ei->vfs_inode; }