On Sun 07-04-24 09:37:25, yebin (H) wrote: > On 2024/4/3 18:11, Jan Kara wrote: > > On Tue 02-04-24 23:37:42, Theodore Ts'o wrote: > > > On Tue, Apr 02, 2024 at 03:42:40PM +0200, Jan Kara wrote: > > > > On Tue 02-04-24 17:09:51, Ye Bin wrote: > > > > > We encountered a problem that the file system could not be mounted in > > > > > the power-off scenario. The analysis of the file system mirror shows that > > > > > only part of the data is written to the last commit block. > > > > > To solve above issue, if commit block checksum is incorrect, check the next > > > > > block if has valid magic and transaction ID. If next block hasn't valid > > > > > magic or transaction ID then just drop the last transaction ignore checksum > > > > > error. Theoretically, the transaction ID maybe occur loopback, which may cause > > > > > the mounting failure. > > > > > > > > > > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx> > > > > So this is curious. The commit block data is fully within one sector and > > > > the expectation of the journaling is that either full sector or nothing is > > > > written. So what kind of storage were you using that it breaks these > > > > expectations? > > > I suppose if the physical sector size is 512 bytes, and the file > > > system block is 4k, I suppose it's possible that on a crash, that part > > > of the 4k commit block could be written. > > I was thinking about that as well but the commit block looks like: > > > > truct commit_header { > > __be32 h_magic; > > __be32 h_blocktype; > > __be32 h_sequence; > > unsigned char h_chksum_type; > > unsigned char h_chksum_size; > > unsigned char h_padding[2]; > > __be32 h_chksum[JBD2_CHECKSUM_BYTES]; > > __be64 h_commit_sec; > > __be32 h_commit_nsec; > > }; > > > > where JBD2_CHECKSUM_BYTES is 8. So all the data in the commit block > > including the checksum is in the first 60 bytes. Hence I would be really > > surprised if some storage can tear that... > This issue has been encountered a few times in the context of eMMC devices. > The vendor > has confirmed that only 512-byte atomicity can be ensured in the firmware. > Although the valid data is only 60 bytes, the entire commit block is used > for calculating > the checksum. > jbd2_commit_block_csum_verify: > ... > calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize); > ... Ah, indeed. This is the bit I've missed. Thanks for explanation! Still I think trying to somehow automatically deal with wrong commit block checksum is too dangerous because it can result in fs corruption in some (unlikely) cases. OTOH I understand journal replay failure after a power fail isn't great either so we need to think how to fix this... Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR