Re: [RFC PATCH] jbd2: fix a potential assertion failure due to improperly dirtied buffer

Jan Kara <jack@xxxxxxx> · Tue, 6 Aug 2024 15:40:23 +0200

Hello!

On Sat 20-07-24 14:23:56, zhangshida wrote:
> From: Shida Zhang <zhangshida@xxxxxxxxxx>
> 
> On an old kernel version(4.19, ext3, journal=data, pagesize=64k),
> an assertion failure will occasionally be triggered by the line below:

OK, just out of curiosity, why are you using data=journal mode? It doesn't
really get that much testing and the performance is quite bad...

> jbd2_journal_commit_transaction
> {
> ...
> J_ASSERT_BH(bh, !buffer_dirty(bh));
> /*
> * The buffer on BJ_Forget list and not jbddirty means
> ...
> }
> 
> AFAIC, that's how the problem works:
> --------
> journal_unmap_buffer
> jbd2_journal_invalidatepage
> __ext4_journalled_invalidatepage
> ext4_journalled_invalidatepage
> do_invalidatepage
> truncate_inode_pages_range
> truncate_inode_pages
> truncate_pagecache
> ext4_setattr
> --------
>
> First try to truncate and invalidate the page.
> Sometimes the buffer and the page won't be freed immediately.
> the buffer will be sent to the BJ_Forget list of the currently
> committing transaction. Maybe the transaction knows when and how
> to free the buffer better.
> The buffer's states now: !jbd_dirty !mapped !dirty
> 
> Then jbd2_journal_commit_transaction(）will try to iterate over the
> BJ_Forget list:
> --------
> jbd2_journal_commit_transaction()
> 	while (commit_transaction->t_forget) {
> 	...
> 	}
> --------
> 
> At this exact moment, another write comes:
> --------
> mark_buffer_dirty
> __block_write_begin_int
> __block_write_begin
> ext4_write_begin
> --------
> it sees a unmapped new buffer, and marks it as dirty.

This should not happen. When ext4_setattr() truncates the file, we do not
allow reallocating these blocks for other purposes until the transaction
commits. Did you find this using some tracing or just code analysis?

There have been quite some fixes to data=journal mode since 4.19 so it may
quite well be that your problem is actually already fixed in current
kernels...

								Honza

> Finally, jbd2_journal_commit_transaction(）will trip over the assertion
> during the BJ_Forget list iteration.
> 
> After an code analysis, it seems that nothing can prevent the
> __block_write_begin() from dirtying the buffer at this very moment.
> 
> The same condition may also be applied to the lattest kernel version.
> 
> To fix it:
> Add some checks to see if it is the case dirtied by __block_write_begin().
> if yes, it's okay and just let it go. The one who dirtied it and the
> jbd2_journal_commit_transaction() will know how to cooperate together and
> deal with it in a proper manner.
> if no, try to complain as we normally will do.
> 
> Reported-by: Baolin Liu <liubaolin@xxxxxxxxxx>
> Signed-off-by: Shida Zhang <zhangshida@xxxxxxxxxx>
> ---
>  fs/jbd2/commit.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
> index 75ea4e9a5cab..2c13d0af92d8 100644
> --- a/fs/jbd2/commit.c
> +++ b/fs/jbd2/commit.c
> @@ -1023,7 +1023,20 @@ void jbd2_journal_commit_transaction(journal_t *journal)
>  			if (is_journal_aborted(journal))
>  				clear_buffer_jbddirty(bh);
>  		} else {
> -			J_ASSERT_BH(bh, !buffer_dirty(bh));
> +			bool try_to_complain = 1;
> +			struct folio *folio = NULL;
> +
> +			folio = bh->b_folio;
> +			/*
> +			 * Try not to complain about the dirty buffer marked dirty
> +			 * by __block_write_begin().
> +			 */
> +			if (buffer_dirty(bh) && folio && folio->mapping
> +			    && folio_test_locked(folio))
> +				try_to_complain = 0;
> +
> +			if (try_to_complain)
> +				J_ASSERT_BH(bh, !buffer_dirty(bh));
>  			/*
>  			 * The buffer on BJ_Forget list and not jbddirty means
>  			 * it has been freed by this transaction and hence it
> -- 
> 2.33.0
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR