Re: [PATCH v2 05/12] jbd2: fast-commit commit path new APIs

harshad shirwadkar <harshadshirwadkar@xxxxxxxxx> · Mon, 12 Aug 2019 10:41:48 -0700

Thanks Andreas and Ted for the review.

Yeah, this makes sense.

On Mon, Aug 12, 2019 at 9:04 AM Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
>
> On Thu, Aug 08, 2019 at 08:45:45PM -0700, Harshad Shirwadkar wrote:
> > This patch adds new helper APIs that ext4 needs for fast
> > commits. These new fast commit APIs are used by subsequent fast commit
> > patches to implement fast commits. Following new APIs are added:
> >
> > /*
> >  * Returns when either a full commit or a fast commit
> >  * completes
> >  */
> > int jbd2_fc_complete_commit(journal_tc *journal, tid_t tid,
> >                           tid_t tid, tid_t subtid)
>
> I think there is an opportunity to do something more efficient.
>
> Right now, the ext4_fsync() calls this function, and the file system
> can only do a "fast commit" if all of the modifications made to the
> file system to date are "fast commit eligible".  Otherwise, we have to
> fall back to a normal, slow commit.
>
> We can make this decision on a much more granular level.  Suppose that
> so far during the life of the current transaction, inodes A, B, and C
> have been modified.  The modification to inode A is not fast commit
> eligible (maybe the inode is deleted, or it is involved in a directory
> rename, etc.).  The modification to inode B is fast commit eligible,
> but an fsync was not requested for it.  And the modification to inode
> C *is* fast commit eligble, *and* fsync() has been requested for it.
>
> We only need to write the information for inode C to the fast commit
> area.  The fact that inode A is not fast commit eligible isn't a
> problem.  It will get committed when the normal transaction closes,
> perhaps when the 5 second commit transaction timer expires.  And inode
> B, even though its changes might be fast commit eligible, might
> require writing a large number of data blocks if it were included in
> the fast commit.  So excluding inodes A and B from the fast commit,
> and only writing the logical changes corresponding to the those made
> to inode C, will allow a fast commit to take place.
>
> In order to do that, though, the ext4's fast commit machinery needs to
> know which inode we actually need to do the fast commit for.  And so
> for that reason, it's actually probably better not to run the changes
> through the commit thread.  That makes it harder to plumb the file
> system specific information through, and it also requires waking up
> the commit thread and waiting for it to get scheduled.
I see, so you mean each fsync() call will result in exactly one inode
to be committed (the inode on which fsync was called), right? I agree
this doesn't need to go through JBD2 but we need a mechanism to inform
JBD2 about this fast commit since JBD2 maintains sub-transaction ID.
JBD2 will in turn need to make sure that a subtid was allocated for
such a fast commit and it was incremented once the fast commit was
successful as well.
>
> Instead, ext4_fsync() could just call the fast commit machinery, and
> the only thing we need to expose is a way for the fast commit
> machinery to attempt to grab a mutex preventing the normal commit
> thread from starting a normal commit.  If it loses the race, and the
> normal commit takes place before we manage to do the fast commit; then
> we don't need to do any thing more.  Otherwise the fast commit
> machinery can do its thing, writing inode changes to the journal, and
> once it is done, it can release the mutex and ext4 fsync can return.
>
> Does that make sense?
Thanks for the suggestion, I will implement this in V3.
>
>                                         - Ted