Re: [PATCH v4 1/4] jbd2: make sure dirty flag is cleared while revorking a buffer which belongs to older transaction

"Theodore Y. Ts'o" <tytso@xxxxxxx> · Sun, 10 Feb 2019 23:24:33 -0500

On Wed, Jan 30, 2019 at 02:49:37PM +0800, zhangyi (F) wrote:
> Now, we capture a data corruption problem on ext4 while we're truncating
> an extent index block. Imaging that if we are revoking a buffer which
> has been journaled by the committing transaction, the buffer's jbddirty
> flag will not be cleared in jbd2_journal_forget(), so the commit code
> will set the buffer dirty flag again after refile the buffer.
> 
> fsx                               kjournald2
>                                   jbd2_journal_commit_transaction
> jbd2_journal_revoke                commit phase 1~5...
>  jbd2_journal_forget
>    belongs to older transaction    commit phase 6
>    jbddirty not clear               __jbd2_journal_refile_buffer
>                                      __jbd2_journal_unfile_buffer
>                                       test_clear_buffer_jbddirty
>                                        mark_buffer_dirty
> 
> Finally, if the freed extent index block was allocated again as data
> block by some other files, it may corrupt the file data after writing
> cached pages later, such as during unmount time. (In general,
> clean_bdev_aliases() related helpers should be invoked after
> re-allocation to prevent the above corruption, but unfortunately we
> missed it when zeroout the head of extra extent blocks in
> ext4_ext_handle_unwritten_extents()).
> 
> This patch mark buffer as freed and set j_next_transaction to the new
> transaction when it already belongs to the committing transaction in
> jbd2_journal_forget(), so that commit code knows it should clear dirty
> bits when it is done with the buffer.
> 
> This problem can be reproduced by xfstests generic/455 easily with
> seeds (3246 3247 3248 3249).
> 
> Signed-off-by: zhangyi (F) <yi.zhang@xxxxxxxxxx>
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> Cc: stable@xxxxxxxxxxxxxxx

Thanks, applied.

By the way, I wasn't able to easily reproduce the problem using the
given seeds.  Out of curiosity, what sort test system were you using?
(e.g., how many CPU's, how much memory, what kind of storage device,
etc.)

				- Ted