On Thu, Aug 08, 2019 at 08:45:45PM -0700, Harshad Shirwadkar wrote: > This patch adds new helper APIs that ext4 needs for fast > commits. These new fast commit APIs are used by subsequent fast commit > patches to implement fast commits. Following new APIs are added: > > /* > * Returns when either a full commit or a fast commit > * completes > */ > int jbd2_fc_complete_commit(journal_tc *journal, tid_t tid, > tid_t tid, tid_t subtid) I think there is an opportunity to do something more efficient. Right now, the ext4_fsync() calls this function, and the file system can only do a "fast commit" if all of the modifications made to the file system to date are "fast commit eligible". Otherwise, we have to fall back to a normal, slow commit. We can make this decision on a much more granular level. Suppose that so far during the life of the current transaction, inodes A, B, and C have been modified. The modification to inode A is not fast commit eligible (maybe the inode is deleted, or it is involved in a directory rename, etc.). The modification to inode B is fast commit eligible, but an fsync was not requested for it. And the modification to inode C *is* fast commit eligble, *and* fsync() has been requested for it. We only need to write the information for inode C to the fast commit area. The fact that inode A is not fast commit eligible isn't a problem. It will get committed when the normal transaction closes, perhaps when the 5 second commit transaction timer expires. And inode B, even though its changes might be fast commit eligible, might require writing a large number of data blocks if it were included in the fast commit. So excluding inodes A and B from the fast commit, and only writing the logical changes corresponding to the those made to inode C, will allow a fast commit to take place. In order to do that, though, the ext4's fast commit machinery needs to know which inode we actually need to do the fast commit for. And so for that reason, it's actually probably better not to run the changes through the commit thread. That makes it harder to plumb the file system specific information through, and it also requires waking up the commit thread and waiting for it to get scheduled. Instead, ext4_fsync() could just call the fast commit machinery, and the only thing we need to expose is a way for the fast commit machinery to attempt to grab a mutex preventing the normal commit thread from starting a normal commit. If it loses the race, and the normal commit takes place before we manage to do the fast commit; then we don't need to do any thing more. Otherwise the fast commit machinery can do its thing, writing inode changes to the journal, and once it is done, it can release the mutex and ext4 fsync can return. Does that make sense? - Ted