On Sun 22-05-11 17:13:45, Ted Tso wrote: > From: Jan Kara <jack@xxxxxxx> > > jbd2_log_start_commit() returns 1 only when we really start a > transaction. But we also need to wait for a transaction when the > commit is already running. Fix this problem by waiting for > transaction commit unconditionally (which is just a quick check if the > transaction is already committed). > > Also we have to be more careful with sending of a barrier because when > transaction is being committed in parallel to ext4_sync_file() > running, we cannot be sure that the barrier the journalling code sends > happens after we wrote all the data for fsync (note that not every > data writeout needs to trigger metadata changes thus commit of some > metadata changes can be running while other data is still written > out). So use jbd2_will_send_data_barrier() helper to detect the common > cases when we can be sure barrier will be issued by the commit code > and issue the barrier ourselves in the remaining cases. > > [ Modified by tytso so that the external journal cases are handled in > ext4_sync_file() to avoid needlessly issuing extra flush requests in > the data=ordered and data=journalled cases. ] Well, in data=journal case I agree your change will work (but that's your minor concern I guess). In data=ordered it's harder: a) The flush of j_fs_dev in jbd2_journal_commit_transaction() is issued earlier than we set T_COMMIT_RECORD but that's easy to handle. b) Whether we do or don't send the flush in jbd2_journal_commit_transaction() depends on whether t_flushed_data_blocks is set. We can't know in advance whether it gets set or not because it depends on whether some inode is in transaction's t_inode_list and inodes can get removed from there when flusher thread has written all the pages and inode has been reclaimed. OTOH this looks like a bug in the commit code anyway - I guess t_flushed_data_blocks (or better named equivalent) should be set in jbd2_journal_file_inode(). Then such variable will also become a reliable indicator whether the data flush is going to be sent or not. I'll update the patch set to reflect this... Honza > Signed-off-by: Jan Kara <jack@xxxxxxx> > Signed-off-by: "Theodore Ts'o" <tytso@xxxxxxx> > --- > fs/ext4/fsync.c | 31 ++++++++++++++++++++----------- > 1 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/fs/ext4/fsync.c b/fs/ext4/fsync.c > index 60fe572..b0e03fa 100644 > --- a/fs/ext4/fsync.c > +++ b/fs/ext4/fsync.c > @@ -172,6 +172,7 @@ int ext4_sync_file(struct file *file, int datasync) > journal_t *journal = EXT4_SB(inode->i_sb)->s_journal; > int ret; > tid_t commit_tid; > + bool needs_barrier = false; > > J_ASSERT(ext4_journal_current_handle() == NULL); > > @@ -211,22 +212,30 @@ int ext4_sync_file(struct file *file, int datasync) > } > > commit_tid = datasync ? ei->i_datasync_tid : ei->i_sync_tid; > - if (jbd2_log_start_commit(journal, commit_tid)) { > + if (journal->j_flags & JBD2_BARRIER) { > /* > * When the journal is on a different device than the > - * fs data disk, we need to issue the barrier in > - * writeback mode. (In ordered mode, the jbd2 layer > - * will take care of issuing the barrier. In > + * fs data disk, when data=writeback, we need to issue > + * a barrier unconditionally. (In ordered mode, the > + * jbd2 layer will take care of issuing the barrier if > + * there were any writes associated with the inode; in > * data=journal, all of the data blocks are written to > * the journal device.) > */ > - if (ext4_should_writeback_data(inode) && > - (journal->j_fs_dev != journal->j_dev) && > - (journal->j_flags & JBD2_BARRIER)) > - blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, > - NULL); > - ret = jbd2_log_wait_commit(journal, commit_tid); > - } else if (journal->j_flags & JBD2_BARRIER) > + if ((journal->j_fs_dev != journal->j_dev) && > + ext4_should_writeback_data(inode)) > + needs_barrier = true; > + else if (!jbd2_trans_will_send_data_barrier(journal, > + commit_tid)) > + /* > + * If the journal layer isn't going to issue > + * the barrier, then we'd better. > + */ > + needs_barrier = true; > + } > + jbd2_log_start_commit(journal, commit_tid); > + ret = jbd2_log_wait_commit(journal, commit_tid); > + if (needs_barrier) > blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL, NULL); > out: > trace_ext4_sync_file_exit(inode, ret); > -- > 1.7.3.1 > -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html