On 2019/2/11 12:24, Theodore Y. Ts'o Wrote: > On Wed, Jan 30, 2019 at 02:49:37PM +0800, zhangyi (F) wrote: >> Now, we capture a data corruption problem on ext4 while we're truncating >> an extent index block. Imaging that if we are revoking a buffer which >> has been journaled by the committing transaction, the buffer's jbddirty >> flag will not be cleared in jbd2_journal_forget(), so the commit code >> will set the buffer dirty flag again after refile the buffer. >> >> fsx kjournald2 >> jbd2_journal_commit_transaction >> jbd2_journal_revoke commit phase 1~5... >> jbd2_journal_forget >> belongs to older transaction commit phase 6 >> jbddirty not clear __jbd2_journal_refile_buffer >> __jbd2_journal_unfile_buffer >> test_clear_buffer_jbddirty >> mark_buffer_dirty >> >> Finally, if the freed extent index block was allocated again as data >> block by some other files, it may corrupt the file data after writing >> cached pages later, such as during unmount time. (In general, >> clean_bdev_aliases() related helpers should be invoked after >> re-allocation to prevent the above corruption, but unfortunately we >> missed it when zeroout the head of extra extent blocks in >> ext4_ext_handle_unwritten_extents()). >> >> This patch mark buffer as freed and set j_next_transaction to the new >> transaction when it already belongs to the committing transaction in >> jbd2_journal_forget(), so that commit code knows it should clear dirty >> bits when it is done with the buffer. >> >> This problem can be reproduced by xfstests generic/455 easily with >> seeds (3246 3247 3248 3249). >> >> Signed-off-by: zhangyi (F) <yi.zhang@xxxxxxxxxx> >> Reviewed-by: Jan Kara <jack@xxxxxxx> >> Cc: stable@xxxxxxxxxxxxxxx > > Thanks, applied. > > By the way, I wasn't able to easily reproduce the problem using the > given seeds. Out of curiosity, what sort test system were you using? > (e.g., how many CPU's, how much memory, what kind of storage device, > etc.) Yes, I was also not able to reproduce the problem quite easily, because it depends on block allocation logic. So in order to increase the probability, I choice a relatively small prartition(5GB). I reprocude this problem on a x86_64 kvm virtual machine which have 16 cores, 16GB memory and two 5GB virtio block devices(base on ssd RIAD). Thanks, Yi.