Re: [PATCH v10 4/9] jbd2: add fast commit machinery

Jan Kara <jack@xxxxxxx> · Mon, 26 Oct 2020 10:03:50 +0100

On Fri 23-10-20 10:17:18, harshad shirwadkar wrote:
> Thanks Jan for reviewing the patches.

You're welcome. Rather I'm sorry that I've got to that after so long time.

> On Thu, Oct 22, 2020 at 3:16 AM Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Thu 15-10-20 13:37:56, Harshad Shirwadkar wrote:
> > > This functions adds necessary APIs needed in JBD2 layer for fast
> > > commits.
> > >
> > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx>
> > > ---
> > >  fs/ext4/fast_commit.c |   8 ++
> > >  fs/jbd2/commit.c      |  44 ++++++++++
> > >  fs/jbd2/journal.c     | 190 +++++++++++++++++++++++++++++++++++++++++-
> > >  include/linux/jbd2.h  |  27 ++++++
> > >  4 files changed, 268 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
> > > index 0dad8bdb1253..f2d11b4c6b62 100644
> > > --- a/fs/ext4/fast_commit.c
> > > +++ b/fs/ext4/fast_commit.c
> > > @@ -8,11 +8,19 @@
> > >   * Ext4 fast commits routines.
> > >   */
> > >  #include "ext4_jbd2.h"
> > > +/*
> > > + * Fast commit cleanup routine. This is called after every fast commit and
> > > + * full commit. full is true if we are called after a full commit.
> > > + */
> > > +static void ext4_fc_cleanup(journal_t *journal, int full)
> > > +{
> > > +}
> > >
> > >  void ext4_fc_init(struct super_block *sb, journal_t *journal)
> > >  {
> > >       if (!test_opt2(sb, JOURNAL_FAST_COMMIT))
> > >               return;
> > > +     journal->j_fc_cleanup_callback = ext4_fc_cleanup;
> > >       if (jbd2_fc_init(journal, EXT4_NUM_FC_BLKS)) {
> > >               pr_warn("Error while enabling fast commits, turning off.");
> > >               ext4_clear_feature_fast_commit(sb);
> > > diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> > > index 6252b4c50666..fa688e163a80 100644
> > > --- a/fs/jbd2/commit.c
> > > +++ b/fs/jbd2/commit.c
> > > @@ -206,6 +206,30 @@ int jbd2_journal_submit_inode_data_buffers(struct jbd2_inode *jinode)
> > >       return generic_writepages(mapping, &wbc);
> > >  }
> > >
> > > +/* Send all the data buffers related to an inode */
> > > +int jbd2_submit_inode_data(struct jbd2_inode *jinode)
> > > +{
> > > +
> > > +     if (!jinode || !(jinode->i_flags & JI_WRITE_DATA))
> > > +             return 0;
> > > +
> > > +     trace_jbd2_submit_inode_data(jinode->i_vfs_inode);
> > > +     return jbd2_journal_submit_inode_data_buffers(jinode);
> > > +
> > > +}
> > > +EXPORT_SYMBOL(jbd2_submit_inode_data);
> > > +
> > > +int jbd2_wait_inode_data(journal_t *journal, struct jbd2_inode *jinode)
> > > +{
> > > +     if (!jinode || !(jinode->i_flags & JI_WAIT_DATA) ||
> > > +             !jinode->i_vfs_inode || !jinode->i_vfs_inode->i_mapping)
> > > +             return 0;
> > > +     return filemap_fdatawait_range_keep_errors(
> > > +             jinode->i_vfs_inode->i_mapping, jinode->i_dirty_start,
> > > +             jinode->i_dirty_end);
> > > +}
> > > +EXPORT_SYMBOL(jbd2_wait_inode_data);
> > > +
> > >  /*
> > >   * Submit all the data buffers of inode associated with the transaction to
> > >   * disk.
> > > @@ -415,6 +439,20 @@ void jbd2_journal_commit_transaction(journal_t *journal)
> > >       J_ASSERT(journal->j_running_transaction != NULL);
> > >       J_ASSERT(journal->j_committing_transaction == NULL);
> > >
> > > +     write_lock(&journal->j_state_lock);
> > > +     journal->j_flags |= JBD2_FULL_COMMIT_ONGOING;
> > > +     while (journal->j_flags & JBD2_FAST_COMMIT_ONGOING) {
> > > +             DEFINE_WAIT(wait);
> > > +
> > > +             prepare_to_wait(&journal->j_fc_wait, &wait,
> > > +                             TASK_UNINTERRUPTIBLE);
> > > +             write_unlock(&journal->j_state_lock);
> > > +             schedule();
> > > +             write_lock(&journal->j_state_lock);
> > > +             finish_wait(&journal->j_fc_wait, &wait);
> > > +     }
> > > +     write_unlock(&journal->j_state_lock);
> >
> > Hum, I'd like to understand: Is there a reason to block fastcommits already
> > when the running transaction is in T_LOCKED state? Strictly speaking it is
> > necessary only once we get to T_FLUSH state AFAIU (because only then we
> > start to write transaction to the journal). I guess there are both
> > advantages and disadvantages to it - if we allowed fastcommits running in
> > T_LOCKED state, we could lower fsync() latency more. OTOH it could increase
> > commit latency because we'd have to wait for fastcommits after T_LOCKED
> > state.
> That's right. I thought given that the transaction is anyway entering
> locked state, might as well wait for it to complete instead of writing
> blocks that are going to be obsoleted immediately. Also note that this
> full commit could have started due to fast commits being in ineligible
> state. If that's the case, the fast commit code will realize that it
> can't do much and it will again wait for a full commit. So, even
> though there is a fsync latency benefit to waiting till T_FLUSH, I'd
> still marginally prefer blocking fast commits once the transaction
> enters T_LOCKED state.

OK, makes sence. We can always change it if we find good performance
reasons for other choice.

> > > +}
> > > +EXPORT_SYMBOL(jbd2_fc_end_commit);
> > > +
> > > +int jbd2_fc_end_commit_fallback(journal_t *journal, tid_t tid)
> > > +{
> > > +     return __jbd2_fc_end_commit(journal, tid, 1);
> > > +}
> > > +EXPORT_SYMBOL(jbd2_fc_end_commit_fallback);
> > > +
> >
> > Is there a need for 'tid' here? Once jbd2_fc_begin_commit() sets
> > JBD2_FAST_COMMIT_ONGOING normal commit cannot proceed so when we decide we
> > cannot do fastcommit in the end, we know the transaction that needs to
> > commit is the currently running transaction, so we can fetch its TID from
> > the journal once we hold j_state_lock before clearing
> > JBD2_FAST_COMMIT_ONGOING. Cannot we?

Did you miss this comment?

> > >  /* Return 1 when transaction with given tid has already committed. */
> > >  int jbd2_transaction_committed(journal_t *journal, tid_t tid)
> > >  {
> > > @@ -784,6 +855,110 @@ int jbd2_journal_next_log_block(journal_t *journal, unsigned long long *retp)
> > >       return jbd2_journal_bmap(journal, blocknr, retp);
> > >  }
> > >
> > > +/* Map one fast commit buffer for use by the file system */
> > > +int jbd2_fc_get_buf(journal_t *journal, struct buffer_head **bh_out)
> > > +{
> > > +     unsigned long long pblock;
> > > +     unsigned long blocknr;
> > > +     int ret = 0;
> > > +     struct buffer_head *bh;
> > > +     int fc_off;
> > > +
> > > +     *bh_out = NULL;
> > > +     write_lock(&journal->j_state_lock);
> > > +
> > > +     if (journal->j_fc_off + journal->j_fc_first < journal->j_fc_last) {
> > > +             fc_off = journal->j_fc_off;
> > > +             blocknr = journal->j_fc_first + fc_off;
> > > +             journal->j_fc_off++;
> > > +     } else {
> > > +             ret = -EINVAL;
> > > +     }
> > > +     write_unlock(&journal->j_state_lock);
> >
> > Is j_state_lock really needed here? There is always only one process doing
> > fastcommit so nobody else should be touching j_fc_off and other fields. Or
> > am I missing something?
> You are right, there should only be one process calling
> jbd2_fc_get_buf. I'll fix this.

Maybe add a comment to j_fc_off & co. that they are not protected by any
lock - only by the fact that there's always only a single process doing
fastcommit.

> > > +
> > > +     /*
> > > +      * Wait in reverse order to minimize chances of us being woken up before
> > > +      * all IOs have completed
> > > +      */
> > > +     for (i = j_fc_off - 1; i >= j_fc_off - num_blks; i--) {
> > > +             bh = journal->j_fc_wbuf[i];
> > > +             wait_on_buffer(bh);
> > > +             put_bh(bh);
> > > +             journal->j_fc_wbuf[i] = NULL;
> > > +             if (unlikely(!buffer_uptodate(bh)))
> > > +                     return -EIO;
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +EXPORT_SYMBOL(jbd2_fc_wait_bufs);
> > > +
> > > +/*
> > > + * Wait on fast commit buffers that were allocated by jbd2_fc_get_buf
> > > + * for completion.
> > > + */
> > > +int jbd2_fc_release_bufs(journal_t *journal)
> > > +{
> > > +     struct buffer_head *bh;
> > > +     int i, j_fc_off;
> > > +
> > > +     read_lock(&journal->j_state_lock);
> > > +     j_fc_off = journal->j_fc_off;
> > > +     read_unlock(&journal->j_state_lock);
> > > +
> > > +     /*
> > > +      * Wait in reverse order to minimize chances of us being woken up before
> > > +      * all IOs have completed
> > > +      */
> > > +     for (i = j_fc_off - 1; i >= 0; i--) {
> > > +             bh = journal->j_fc_wbuf[i];
> > > +             if (!bh)
> > > +                     break;
> > > +             put_bh(bh);
> > > +             journal->j_fc_wbuf[i] = NULL;
> > > +     }
> > > +
> > > +     return 0;
> > > +}
> > > +EXPORT_SYMBOL(jbd2_fc_release_bufs);
> > > +
> >
> > I kind of wonder if releasing of buffers shouldn't be done automatically
> > either as part of jbd2_fc_wait_bufs() or when ending fastcommit. But I
> > don't have a strong opinion so this is just an idea for consideration.
> So, that's what I do. The buffers get released in jbd2_fc_wait_bufs().
> However, in case of errors or fallback to full commits, buffers may
> not be submitted and thus won't be released. So this function is to
> release all the unsubmitted buffers. This gets called from the cleanup
> callback which is called after every successful or failed full commit
> or fast commit.

Aha, I missed that. Thanks for explanation.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR