Re: Metadata CRC error detected at xfs_dir3_block_read_verify+0x9e/0xc0 [xfs], xfs_dir3_block block 0x86f58

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 17 Mar 2022 19:24:11 +1100

On Thu, Mar 17, 2022 at 07:49:02AM +0100, Manfred Spraul wrote:
> Hi Dave,
> 
> [+Ted as the topic also applies to ext4]
> 
> On 3/17/22 04:08, Dave Chinner wrote:
> > On Thu, Mar 17, 2022 at 01:47:05PM +1100, Dave Chinner wrote:
> > > On Wed, Mar 16, 2022 at 09:55:04AM +0100, Manfred Spraul wrote:
> > > > Hi Dave,
> > > > 
> > > > On 3/14/22 16:18, Manfred Spraul wrote:
> > > > 
> > > > But:
> > > > 
> > > > I've checked the eMMC specification, and the spec allows that teared write
> > > > happen:
> > > Yes, most storage only guarantees that sector writes are atomic and
> > > so multi-sector writes have no guarantees of being written
> > > atomically.  IOWs, all storage technologies that currently exist are
> > > allowed to tear multi-sector writes.
> > > 
> > > However, FUA writes are guaranteed to be whole on persistent storage
> > > regardless of size when the hardware signals completion. And any
> > > write that the hardware has signalled as complete before a cache
> > > flush is received is also guaranteed to be whole on persistent
> > > storage when the cache flush is signalled as complete by the
> > > hardware. These mechanisms provide protection against torn writes.
> 
> My plan was to create a replay application that randomly creates disc images
> allowed by the writeback_cache_control documentation.
> 
> https://www.kernel.org/doc/html/latest/block/writeback_cache_control.html
>
> And then check that the filesystem behaves as expected/defined.

We already have that tool that exercises stepwise flush/fua aware
write recovery for filesystem testing: dm-logwrites was written and
integrated into fstests years ago (2016?) by Josef Bacik for testing
btrfs recovery, but it was a generic solution that all filesystems
can use to test failure recovery....

See, for example, common/dmlogwrites and tests/generic/482 - g/482
uses fsstress to randomly modify the filesystem while dm-logwrites
records all the writes made by the filesystem. It then replays them
one flush/fua at a time, mounting the filesystem to ensure that it
can recover the filesystem, then runs filesystem checkers to ensure
that the filesystem does not have any corrupt metadata. Then it
replays to the next flush/fua and repeats.

tools/dm-logwrite-replay provides a script and documents the
methodology to run step by step through replay of g/482 failures to
be able to reliably reproduce and diagnose the cause of the failure.

There's no need to re-invent the wheel if we've already got a
perfectly good one...

> > > > Is my understanding correct that XFS support neither eMMC nor NVM devices?
> > > > (unless there is a battery backup that exceeds the guarantees from the spec)
> > > Incorrect.
> > > 
> > > They are supported just fine because flush/FUA semantics provide
> > > guarantees against torn writes in normal operation. IOWs, torn
> > > writes are something that almost *never* happen in real life, even
> > > when power fails suddenly. Despite this, XFS can detect it has
> > > occurred (because broken storage is all too common!), and if it
> > > can't recovery automatically, it will shut down and ask the user to
> > > correct the problem.
> 
> So for xfs the behavior should be:
> 
> - without torn writes: Mount always successful, no errors when accessing the
> content.

Yes.

Of course, there are software bugs, so mounts, recovery and
subsequent repair testing can still fail.

> - with torn writes: There may be error that will be detected only at
> runtime. The errors may at the end cause a file system shutdown.

Yes, and they may even prevent the filesystem from being mounted
because recovery trips over them (e.g. processing pending unlinked
inodes or replaying incomplete intents).

> (commented dmesg is attached)
> 
> The application I have in mind are embedded systems.
> I.e. there is no user that can correct something, the recovery strategy must
> be included in the design.

Good luck with that - storage hardware fails in ways that no
existing filesystem can recover automatically from 100% of the time.
And very few even attempt to do so because it is largely an
impossible requirement to fulfil. Torn writes are just the tip of
the iceberg....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx