On Mon, Oct 26, 2020 at 2:03 AM Jan Kara <jack@xxxxxxx> wrote: > > On Fri 23-10-20 10:17:18, harshad shirwadkar wrote: > > Thanks Jan for reviewing the patches. > > You're welcome. Rather I'm sorry that I've got to that after so long time. > > > On Thu, Oct 22, 2020 at 3:16 AM Jan Kara <jack@xxxxxxx> wrote: > > > > > > On Thu 15-10-20 13:37:56, Harshad Shirwadkar wrote: > > > > This functions adds necessary APIs needed in JBD2 layer for fast > > > > commits. > > > > > > > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx> > > > > --- > > > > fs/ext4/fast_commit.c | 8 ++ > > > > fs/jbd2/commit.c | 44 ++++++++++ > > > > fs/jbd2/journal.c | 190 +++++++++++++++++++++++++++++++++++++++++- > > > > include/linux/jbd2.h | 27 ++++++ > > > > 4 files changed, 268 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c > > > > index 0dad8bdb1253..f2d11b4c6b62 100644 > > > > --- a/fs/ext4/fast_commit.c > > > > +++ b/fs/ext4/fast_commit.c > > > > @@ -8,11 +8,19 @@ > > > > * Ext4 fast commits routines. > > > > */ > > > > #include "ext4_jbd2.h" > > > > +/* > > > > + * Fast commit cleanup routine. This is called after every fast commit and > > > > + * full commit. full is true if we are called after a full commit. > > > > + */ > > > > +static void ext4_fc_cleanup(journal_t *journal, int full) > > > > +{ > > > > +} > > > > > > > > void ext4_fc_init(struct super_block *sb, journal_t *journal) > > > > { > > > > if (!test_opt2(sb, JOURNAL_FAST_COMMIT)) > > > > return; > > > > + journal->j_fc_cleanup_callback = ext4_fc_cleanup; > > > > if (jbd2_fc_init(journal, EXT4_NUM_FC_BLKS)) { > > > > pr_warn("Error while enabling fast commits, turning off."); > > > > ext4_clear_feature_fast_commit(sb); > > > > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c > > > > index 6252b4c50666..fa688e163a80 100644 > > > > --- a/fs/jbd2/commit.c > > > > +++ b/fs/jbd2/commit.c > > > > @@ -206,6 +206,30 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode) > > > > return generic_writepages(mapping, &wbc); > > > > } > > > > > > > > +/* Send all the data buffers related to an inode */ > > > > +int jbd2_submit_inode_data(struct jbd2_inode *jinode) > > > > +{ > > > > + > > > > + if (!jinode || !(jinode->i_flags & JI_WRITE_DATA)) > > > > + return 0; > > > > + > > > > + trace_jbd2_submit_inode_data(jinode->i_vfs_inode); > > > > + return jbd2_journal_submit_inode_data_buffers(jinode); > > > > + > > > > +} > > > > +EXPORT_SYMBOL(jbd2_submit_inode_data); > > > > + > > > > +int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode) > > > > +{ > > > > + if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) || > > > > + !jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping) > > > > + return 0; > > > > + return filemap_fdatawait_range_keep_errors( > > > > + jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start, > > > > + jinode->i_dirty_end); > > > > +} > > > > +EXPORT_SYMBOL(jbd2_wait_inode_data); > > > > + > > > > /* > > > > * Submit all the data buffers of inode associated with the transaction to > > > > * disk. > > > > @@ -415,6 +439,20 @@ void jbd2_journal_commit_transaction(journal_t *journal) > > > > J_ASSERT(journal->j_running_transaction != NULL); > > > > J_ASSERT(journal->j_committing_transaction == NULL); > > > > > > > > + write_lock(&journal->j_state_lock); > > > > + journal->j_flags |= JBD2_FULL_COMMIT_ONGOING; > > > > + while (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) { > > > > + DEFINE_WAIT(wait); > > > > + > > > > + prepare_to_wait(&journal->j_fc_wait, &wait, > > > > + TASK_UNINTERRUPTIBLE); > > > > + write_unlock(&journal->j_state_lock); > > > > + schedule(); > > > > + write_lock(&journal->j_state_lock); > > > > + finish_wait(&journal->j_fc_wait, &wait); > > > > + } > > > > + write_unlock(&journal->j_state_lock); > > > > > > Hum, I'd like to understand: Is there a reason to block fastcommits already > > > when the running transaction is in T_LOCKED state? Strictly speaking it is > > > necessary only once we get to T_FLUSH state AFAIU (because only then we > > > start to write transaction to the journal). I guess there are both > > > advantages and disadvantages to it - if we allowed fastcommits running in > > > T_LOCKED state, we could lower fsync() latency more. OTOH it could increase > > > commit latency because we'd have to wait for fastcommits after T_LOCKED > > > state. > > That's right. I thought given that the transaction is anyway entering > > locked state, might as well wait for it to complete instead of writing > > blocks that are going to be obsoleted immediately. Also note that this > > full commit could have started due to fast commits being in ineligible > > state. If that's the case, the fast commit code will realize that it > > can't do much and it will again wait for a full commit. So, even > > though there is a fsync latency benefit to waiting till T_FLUSH, I'd > > still marginally prefer blocking fast commits once the transaction > > enters T_LOCKED state. > > OK, makes sence. We can always change it if we find good performance > reasons for other choice. Sounds good! > > > > > +} > > > > +EXPORT_SYMBOL(jbd2_fc_end_commit); > > > > + > > > > +int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid) > > > > +{ > > > > + return __jbd2_fc_end_commit(journal, tid, 1); > > > > +} > > > > +EXPORT_SYMBOL(jbd2_fc_end_commit_fallback); > > > > + > > > > > > Is there a need for 'tid' here? Once jbd2_fc_begin_commit() sets > > > JBD2_FAST_COMMIT_ONGOING normal commit cannot proceed so when we decide we > > > cannot do fastcommit in the end, we know the transaction that needs to > > > commit is the currently running transaction, so we can fetch its TID from > > > the journal once we hold j_state_lock before clearing > > > JBD2_FAST_COMMIT_ONGOING. Cannot we? > > Did you miss this comment? Oops. Yeah, I did. Sorry for that. Yes, when we call this function, we know that the TID that needs to be committed is the current running transaction. I'll fix this. > > > > > /* Return 1 when transaction with given tid has already committed. */ > > > > int jbd2_transaction_committed(journal_t *journal, tid_t tid) > > > > { > > > > @@ -784,6 +855,110 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp) > > > > return jbd2_journal_bmap(journal, blocknr, retp); > > > > } > > > > > > > > +/* Map one fast commit buffer for use by the file system */ > > > > +int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out) > > > > +{ > > > > + unsigned long long pblock; > > > > + unsigned long blocknr; > > > > + int ret = 0; > > > > + struct buffer_head *bh; > > > > + int fc_off; > > > > + > > > > + *bh_out = NULL; > > > > + write_lock(&journal->j_state_lock); > > > > + > > > > + if (journal->j_fc_off + journal->j_fc_first < journal->j_fc_last) { > > > > + fc_off = journal->j_fc_off; > > > > + blocknr = journal->j_fc_first + fc_off; > > > > + journal->j_fc_off++; > > > > + } else { > > > > + ret = -EINVAL; > > > > + } > > > > + write_unlock(&journal->j_state_lock); > > > > > > Is j_state_lock really needed here? There is always only one process doing > > > fastcommit so nobody else should be touching j_fc_off and other fields. Or > > > am I missing something? > > You are right, there should only be one process calling > > jbd2_fc_get_buf. I'll fix this. > > Maybe add a comment to j_fc_off & co. that they are not protected by any > lock - only by the fact that there's always only a single process doing > fastcommit. Ack Thanks, Harshad > > > > > + > > > > + /* > > > > + * Wait in reverse order to minimize chances of us being woken up before > > > > + * all IOs have completed > > > > + */ > > > > + for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) { > > > > + bh = journal->j_fc_wbuf[i]; > > > > + wait_on_buffer(bh); > > > > + put_bh(bh); > > > > + journal->j_fc_wbuf[i] = NULL; > > > > + if (unlikely(!buffer_uptodate(bh))) > > > > + return -EIO; > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > +EXPORT_SYMBOL(jbd2_fc_wait_bufs); > > > > + > > > > +/* > > > > + * Wait on fast commit buffers that were allocated by jbd2_fc_get_buf > > > > + * for completion. > > > > + */ > > > > +int jbd2_fc_release_bufs(journal_t *journal) > > > > +{ > > > > + struct buffer_head *bh; > > > > + int i, j_fc_off; > > > > + > > > > + read_lock(&journal->j_state_lock); > > > > + j_fc_off = journal->j_fc_off; > > > > + read_unlock(&journal->j_state_lock); > > > > + > > > > + /* > > > > + * Wait in reverse order to minimize chances of us being woken up before > > > > + * all IOs have completed > > > > + */ > > > > + for (i = j_fc_off - 1; i >= 0; i--) { > > > > + bh = journal->j_fc_wbuf[i]; > > > > + if (!bh) > > > > + break; > > > > + put_bh(bh); > > > > + journal->j_fc_wbuf[i] = NULL; > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > +EXPORT_SYMBOL(jbd2_fc_release_bufs); > > > > + > > > > > > I kind of wonder if releasing of buffers shouldn't be done automatically > > > either as part of jbd2_fc_wait_bufs() or when ending fastcommit. But I > > > don't have a strong opinion so this is just an idea for consideration. > > So, that's what I do. The buffers get released in jbd2_fc_wait_bufs(). > > However, in case of errors or fallback to full commits, buffers may > > not be submitted and thus won't be released. So this function is to > > release all the unsubmitted buffers. This gets called from the cleanup > > callback which is called after every successful or failed full commit > > or fast commit. > > Aha, I missed that. Thanks for explanation. > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR