Re: Metadata CRC error detected at xfs_dir3_block_read_verify+0x9e/0xc0 [xfs], xfs_dir3_block block 0x86f58

Manfred Spraul <manfred@xxxxxxxxxxxxxxxx> · Thu, 17 Mar 2022 17:09:49 +0100

Hi Dave,

On 3/17/22 09:24, Dave Chinner wrote:
On Thu, Mar 17, 2022 at 07:49:02AM +0100, Manfred Spraul wrote:
Hi Dave,

[+Ted as the topic also applies to ext4]

On 3/17/22 04:08, Dave Chinner wrote:
On Thu, Mar 17, 2022 at 01:47:05PM +1100, Dave Chinner wrote:
On Wed, Mar 16, 2022 at 09:55:04AM +0100, Manfred Spraul wrote:
Hi Dave,

On 3/14/22 16:18, Manfred Spraul wrote:

But:

I've checked the eMMC specification, and the spec allows that teared write
happen:
Yes, most storage only guarantees that sector writes are atomic and
so multi-sector writes have no guarantees of being written
atomically.  IOWs, all storage technologies that currently exist are
allowed to tear multi-sector writes.

However, FUA writes are guaranteed to be whole on persistent storage
regardless of size when the hardware signals completion. And any
write that the hardware has signalled as complete before a cache
flush is received is also guaranteed to be whole on persistent
storage when the cache flush is signalled as complete by the
hardware. These mechanisms provide protection against torn writes.
My plan was to create a replay application that randomly creates disc images
allowed by the writeback_cache_control documentation.

https://www.kernel.org/doc/html/latest/block/writeback_cache_control.html

And then check that the filesystem behaves as expected/defined.
We already have that tool that exercises stepwise flush/fua aware
write recovery for filesystem testing: dm-logwrites was written and
integrated into fstests years ago (2016?) by Josef Bacik for testing
btrfs recovery, but it was a generic solution that all filesystems
can use to test failure recovery....

See, for example, common/dmlogwrites and tests/generic/482 - g/482
uses fsstress to randomly modify the filesystem while dm-logwrites
records all the writes made by the filesystem. It then replays them
one flush/fua at a time, mounting the filesystem to ensure that it
can recover the filesystem, then runs filesystem checkers to ensure
that the filesystem does not have any corrupt metadata. Then it
replays to the next flush/fua and repeats.

tools/dm-logwrite-replay provides a script and documents the
methodology to run step by step through replay of g/482 failures to
be able to reliably reproduce and diagnose the cause of the failure.

There's no need to re-invent the wheel if we've already got a
perfectly good one...

Thanks a lot for the hint!

I was thinking were a replay tool might exist and came up with nbd. 
Feedback was that it doesn't exist so I wrote something.

I didn't think about dm.

I'll look at dm-log-writes.

Is my understanding correct that XFS support neither eMMC nor NVM devices?
(unless there is a battery backup that exceeds the guarantees from the spec)
Incorrect.

They are supported just fine because flush/FUA semantics provide
guarantees against torn writes in normal operation. IOWs, torn
writes are something that almost *never* happen in real life, even
when power fails suddenly. Despite this, XFS can detect it has
occurred (because broken storage is all too common!), and if it
can't recovery automatically, it will shut down and ask the user to
correct the problem.
So for xfs the behavior should be:

- without torn writes: Mount always successful, no errors when accessing the
content.
Yes.

Of course, there are software bugs, so mounts, recovery and
subsequent repair testing can still fail.

- with torn writes: There may be error that will be detected only at
runtime. The errors may at the end cause a file system shutdown.
Yes, and they may even prevent the filesystem from being mounted
because recovery trips over them (e.g. processing pending unlinked
inodes or replaying incomplete intents).

(commented dmesg is attached)

The application I have in mind are embedded systems.
I.e. there is no user that can correct something, the recovery strategy must
be included in the design.
Good luck with that - storage hardware fails in ways that no
existing filesystem can recover automatically from 100% of the time.
And very few even attempt to do so because it is largely an
impossible requirement to fulfil. Torn writes are just the tip of
the iceberg....
Yes :-(

--

    Manfred