Re: [PATCH v9 1/9] doc: update ext4 and journalling docs to include fast commit feature

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Tue, 22 Sep 2020 10:50:01 -0700

On Fri, Sep 18, 2020 at 05:54:43PM -0700, Harshad Shirwadkar wrote:
> This patch adds necessary documentation for fast commits.
> 
> Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx>
> ---
>  Documentation/filesystems/ext4/journal.rst | 66 ++++++++++++++++++++++
>  Documentation/filesystems/journalling.rst  | 28 +++++++++
>  2 files changed, 94 insertions(+)
> 
> diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst
> index ea613ee701f5..c2e4d010a201 100644
> --- a/Documentation/filesystems/ext4/journal.rst
> +++ b/Documentation/filesystems/ext4/journal.rst
> @@ -28,6 +28,17 @@ metadata are written to disk through the journal. This is slower but
>  safest. If ``data=writeback``, dirty data blocks are not flushed to the
>  disk before the metadata are written to disk through the journal.
>  
> +In case of ``data=ordered`` mode, Ext4 also supports fast commits which
> +help reduce commit latency significantly. The default ``data=ordered``
> +mode works by logging metadata blocks tothe journal. In fast commit

"to the journal"

> +mode, Ext4 only stores the minimal delta needed to recreate the
> +affected metadata in fast commit space that is shared with JBD2.
> +Once the fast commit area fills in or if fast commit is not possible
> +or if JBD2 commit timer goes off, Ext4 performs a traditional full commit.
> +A full commit invalidates all the fast commits that happened before
> +it and thus it makes the fast commit area empty for further fast
> +commits. This feature needs to be enabled at compile time.

And mkfs time too, I would hope?

> +
>  The journal inode is typically inode 8. The first 68 bytes of the
>  journal inode are replicated in the ext4 superblock. The journal itself
>  is normal (but hidden) file within the filesystem. The file usually
> @@ -609,3 +620,58 @@ bytes long (but uses a full block):
>       - h\_commit\_nsec
>       - Nanoseconds component of the above timestamp.
>  
> +Fast commits
> +~~~~~~~~~~~~
> +
> +Fast commit area is organized as a log of tag tag length values. Each TLV has
> +a ``struct ext4_fc_tl`` in the beginning which stores the tag and the length
> +of the entire field. It is followed by variable length tag specific value.

"The fast commit area is organized as a log of tagged variable-length
values.  Each value begins with a ``struct ext4_fc_tl`` tag that
identifies the type of the value and its length, and is followed by the
value itself." ?

I would've called that struct "ext4_fc_tag" or something, since "tl"
isn't really a word... ah well.

> +Here is the list of supported tags and their meanings:
> +
> +.. list-table::
> +   :widths: 8 20 20 32
> +   :header-rows: 1
> +
> +   * - Tag
> +     - Meaning
> +     - Value struct
> +     - Description
> +   * - EXT4_FC_TAG_HEAD
> +     - Fast commit area header
> +     - ``struct ext4_fc_head``
> +     - Stores the TID of the transaction after which these fast commits should
> +       be applied.

So I guess log recovery is supposed to apply the transaction TID, then
apply these fast commits, and then move on to the next transaction?

--D

> +   * - EXT4_FC_TAG_ADD_RANGE
> +     - Add extent to inode
> +     - ``struct ext4_fc_add_range``
> +     - Stores the inode number and extent to be added in this inode
> +   * - EXT4_FC_TAG_DEL_RANGE
> +     - Remove logical offsets to inode
> +     - ``struct ext4_fc_del_range``
> +     - Stores the inode number and the logical offset range that needs to be
> +       removed
> +   * - EXT4_FC_TAG_CREAT
> +     - Create directory entry for a newly created file
> +     - ``struct ext4_fc_dentry_info``
> +     - Stores the parent inode numer, inode number and directory entry of the
> +       newly created file
> +   * - EXT4_FC_TAG_LINK
> +     - Link a directory entry to an inode
> +     - ``struct ext4_fc_dentry_info``
> +     - Stores the parent inode numer, inode number and directory entry
> +   * - EXT4_FC_TAG_UNLINK
> +     - Unink a directory entry of an inode
> +     - ``struct ext4_fc_dentry_info``
> +     - Stores the parent inode numer, inode number and directory entry
> +
> +   * - EXT4_FC_TAG_PAD
> +     - Padding (unused area)
> +     - None
> +     - Unused bytes in the fast commit area.
> +
> +   * - EXT4_FC_TAG_TAIL
> +     - Mark the end of a fast commit
> +     - ``struct ext4_fc_tail``
> +     - Stores the TID of the commit, CRC of the fast commit of which this tag
> +       represents the end of
> +
> diff --git a/Documentation/filesystems/journalling.rst b/Documentation/filesystems/journalling.rst
> index 58ce6b395206..a9817220dc9b 100644
> --- a/Documentation/filesystems/journalling.rst
> +++ b/Documentation/filesystems/journalling.rst
> @@ -132,6 +132,34 @@ The opportunities for abuse and DOS attacks with this should be obvious,
>  if you allow unprivileged userspace to trigger codepaths containing
>  these calls.
>  
> +Fast commits
> +~~~~~~~~~~~~
> +
> +JBD2 to also allows you to perform file-system specific delta commits known as
> +fast commits. In order to use fast commits, you first need to call
> +:c:func:`jbd2_fc_init` and tell how many blocks at the end of journal
> +area should be reserved for fast commits. Along with that, you will also need
> +to set following callbacks that perform correspodning work:
> +
> +`journal->j_fc_cleanup_cb`: Cleanup function called after every full commit and
> +fast commit.
> +
> +`journal->j_fc_replay_cb`: Replay function called for replay of fast commit
> +blocks.
> +
> +File system is free to perform fast commits as and when it wants as long as it
> +gets permission from JBD2 to do so by calling the function
> +:c:func:`jbd2_fc_start()`. Once a fast commit is done, the client
> +file  system should tell JBD2 about it by calling :c:func:`jbd2_fc_stop()`.
> +If file system wants JBD2 to perform a full commit immediately after stopping
> +the fast commit it can do so by calling :c:func:`jbd2_fc_stop_do_commit()`.
> +This is useful if fast commit operation fails for some reason and the only way
> +to guarantee consistency is for JBD2 to perform the full traditional commit.
> +
> +JBD2 helper functions to manage fast commit buffers. File system can use
> +:c:func:`jbd2_fc_get_buf()` and :c:func:`jbd2_fc_wait_bufs()` to allocate
> +and wait on IO completion of fast commit buffers.
> +
>  Summary
>  ~~~~~~~
>  
> -- 
> 2.28.0.681.g6f77f65b4e-goog
>