It can theoretically happen, that in data=ordered mode inode is filed to transaction's t_inode_list, then flusher thread writes all the data and inode is reclaimed before transaction starts to commit. In such case we could errorneously ommit sending a flush to filesystem device when it is different from the journal device (because data can still be in disk cache only). Fix the problem by setting a flag in a transaction when some inode is added to it and then send disk flush in the commit code when the flag is set. Signed-off-by: Jan Kara <jack@xxxxxxx> --- fs/jbd2/commit.c | 3 +-- fs/jbd2/transaction.c | 7 +++++++ include/linux/jbd2.h | 4 +++- 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c index 6e28000..8bd8790 100644 --- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -219,7 +219,6 @@ static int journal_submit_data_buffers(journal_t *journal, ret = err; spin_lock(&journal->j_list_lock); J_ASSERT(jinode->i_transaction == commit_transaction); - commit_transaction->t_flushed_data_blocks = 1; clear_bit(__JI_COMMIT_RUNNING, &jinode->i_flags); smp_mb__after_clear_bit(); wake_up_bit(&jinode->i_flags, __JI_COMMIT_RUNNING); @@ -683,7 +682,7 @@ start_journal_io: * then we must flush the file system device before we issue * the commit record */ - if (commit_transaction->t_flushed_data_blocks && + if (commit_transaction->t_need_data_flush && (journal->j_fs_dev != journal->j_dev) && (journal->j_flags & JBD2_BARRIER)) blkdev_issue_flush(journal->j_fs_dev, GFP_KERNEL, NULL); diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 05fa77a..7f70390 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2147,6 +2147,13 @@ int jbd2_journal_file_inode(handle_t *handle, struct jbd2_inode *jinode) jinode->i_next_transaction == transaction) goto done; + /* + * We only ever set this variable to 1 so the test is safe. Since + * t_need_data_flush is likely to be set, we do the test to save some + * cacheline bouncing + */ + if (!transaction->t_need_data_flush) + transaction->t_need_data_flush = 1; /* On some different transaction's list - should be * the committing one */ if (jinode->i_transaction) { diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index a32dcae..4d57955 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -658,7 +658,9 @@ struct transaction_s * waiting for it to finish. */ unsigned int t_synchronous_commit:1; - unsigned int t_flushed_data_blocks:1; + + /* Disk flush needs to be sent to fs partition [no locking] */ + int t_need_data_flush; /* * For use by the filesystem to store fs-specific data -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html