On Tue 02-04-24 23:37:42, Theodore Ts'o wrote: > On Tue, Apr 02, 2024 at 03:42:40PM +0200, Jan Kara wrote: > > On Tue 02-04-24 17:09:51, Ye Bin wrote: > > > We encountered a problem that the file system could not be mounted in > > > the power-off scenario. The analysis of the file system mirror shows that > > > only part of the data is written to the last commit block. > > > To solve above issue, if commit block checksum is incorrect, check the next > > > block if has valid magic and transaction ID. If next block hasn't valid > > > magic or transaction ID then just drop the last transaction ignore checksum > > > error. Theoretically, the transaction ID maybe occur loopback, which may cause > > > the mounting failure. > > > > > > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx> > > > > So this is curious. The commit block data is fully within one sector and > > the expectation of the journaling is that either full sector or nothing is > > written. So what kind of storage were you using that it breaks these > > expectations? > > I suppose if the physical sector size is 512 bytes, and the file > system block is 4k, I suppose it's possible that on a crash, that part > of the 4k commit block could be written. I was thinking about that as well but the commit block looks like: truct commit_header { __be32 h_magic; __be32 h_blocktype; __be32 h_sequence; unsigned char h_chksum_type; unsigned char h_chksum_size; unsigned char h_padding[2]; __be32 h_chksum[JBD2_CHECKSUM_BYTES]; __be64 h_commit_sec; __be32 h_commit_nsec; }; where JBD2_CHECKSUM_BYTES is 8. So all the data in the commit block including the checksum is in the first 60 bytes. Hence I would be really surprised if some storage can tear that... Hence either Ye Bin is running on some really exotic storage or the storage / CPU in fact flipped bits somewhere so that the checksum didn't match or the commit block was in fact not written now but it was a leftover from previous journal use and h_sequence happened to match. Very unlikely but depending on how exactly they do their powerfail testing I can imagine it would be possible with enough tries... > In *practice* though, this > is super rare. That's because on many modern HDD's, the physical > sector size is 4k (because the ECC overhead is much lower), even if > the logical sector size is 512 byte (for Windows 98 compatibility). > And even on HDD's where the physical sector size is really 512 bytes, > the way the sectors are laid out in a serpentine fashion, it is > *highly* likely that 4k write won't get torn. > > And while this is *possible*, it's also possible that some kind of I/O > transfer error --- such as some bit flips which breaks the checksum on > the commit block, but also trashes the tid of the subsequent block, > such that your patch gets tricked into thinking that this is the > partial last commit, when in fact it's not the last commit, thus > causing the journal replay abort early. If that's case, it's much > safer to force fsck to be run to detect any inconsistency that might > result. Yeah, I agree in these cases of a corrupted journal it seems dangerous to just try to continue without fsck based on some heuristics. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR