Re: [PATCH] jbd2: avoid mount failed when commit block is partial submitted

Jan Kara <jack@xxxxxxx> · Thu, 11 Apr 2024 15:37:18 +0200

On Sun 07-04-24 09:37:25, yebin (H) wrote:
> On 2024/4/3 18:11, Jan Kara wrote:
> > On Tue 02-04-24 23:37:42, Theodore Ts'o wrote:
> > > On Tue, Apr 02, 2024 at 03:42:40PM +0200, Jan Kara wrote:
> > > > On Tue 02-04-24 17:09:51, Ye Bin wrote:
> > > > > We encountered a problem that the file system could not be mounted in
> > > > > the power-off scenario. The analysis of the file system mirror shows that
> > > > > only part of the data is written to the last commit block.
> > > > > To solve above issue, if commit block checksum is incorrect, check the next
> > > > > block if has valid magic and transaction ID. If next block hasn't valid
> > > > > magic or transaction ID then just drop the last transaction ignore checksum
> > > > > error. Theoretically, the transaction ID maybe occur loopback, which may cause
> > > > > the mounting failure.
> > > > > 
> > > > > Signed-off-by: Ye Bin <yebin10@xxxxxxxxxx>
> > > > So this is curious. The commit block data is fully within one sector and
> > > > the expectation of the journaling is that either full sector or nothing is
> > > > written. So what kind of storage were you using that it breaks these
> > > > expectations?
> > > I suppose if the physical sector size is 512 bytes, and the file
> > > system block is 4k, I suppose it's possible that on a crash, that part
> > > of the 4k commit block could be written.
> > I was thinking about that as well but the commit block looks like:
> > 
> > truct commit_header {
> >          __be32          h_magic;
> >          __be32          h_blocktype;
> >          __be32          h_sequence;
> >          unsigned char   h_chksum_type;
> >          unsigned char   h_chksum_size;
> >          unsigned char   h_padding[2];
> >          __be32          h_chksum[JBD2_CHECKSUM_BYTES];
> >          __be64          h_commit_sec;
> >          __be32          h_commit_nsec;
> > };
> > 
> > where JBD2_CHECKSUM_BYTES is 8. So all the data in the commit block
> > including the checksum is in the first 60 bytes. Hence I would be really
> > surprised if some storage can tear that...
> This issue has been encountered a few times in the context of eMMC devices.
> The vendor
> has confirmed that only 512-byte atomicity can be ensured in the firmware.
> Although the valid data is only 60 bytes, the entire commit block is used
> for calculating
> the checksum.
> jbd2_commit_block_csum_verify:
> ...
> calculated = jbd2_chksum(j, j->j_csum_seed, buf, j->j_blocksize);
> ...

Ah, indeed. This is the bit I've missed. Thanks for explanation! Still I
think trying to somehow automatically deal with wrong commit block checksum
is too dangerous because it can result in fs corruption in some (unlikely)
cases. OTOH I understand journal replay failure after a power fail isn't
great either so we need to think how to fix this...

								Honza

-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR