On Wed, Mar 16, 2022 at 09:55:04AM +0100, Manfred Spraul wrote: > Hi Dave, > > On 3/14/22 16:18, Manfred Spraul wrote: > > Hi Dave, > > > > On 3/13/22 23:46, Dave Chinner wrote: > > > OK, this test is explicitly tearing writes at the storage level. > > > When there is an update to multiple sectors of the metadata block, > > > the metadata will be inconsistent on disk while those individual > > > sector writes are replayed. > > > > Thanks for the clarification. > > > > I'll modify the test application to never tear write operations and > > retry. > > > > If there are findings, then I'll distribute them. > > > I've modified the test app, and with 4000 simulated power failures I have > not seen any corruptions. > > > Thus: > > - With teared write operations: 2 corruptions from ~800 simulated power > failures > > - Without teared write operations: no corruptions from ~4000 simulated power > failures. Good to hear. > But: > > I've checked the eMMC specification, and the spec allows that teared write > happen: Yes, most storage only guarantees that sector writes are atomic and so multi-sector writes have no guarantees of being written atomically. IOWs, all storage technologies that currently exist are allowed to tear multi-sector writes. However, FUA writes are guaranteed to be whole on persistent storage regardless of size when the hardware signals completion. And any write that the hardware has signalled as complete before a cache flush is received is also guaranteed to be whole on persistent storage when the cache flush is signalled as complete by the hardware. These mechanisms provide protection against torn writes. IOWs, it's up to filesystems to guarantee data is on stable storage before they trust it fully. Filesystems are pretty good at using REQ_FLUSH, REQ_FUA and write completion ordering to ensure that anything they need whole and complete on stable storage is actually whole and complete. In the cases where torn writes occur because that haven't been covered by a FUA or cache flush guarantee (such as your test), filesystems need mechanisms in their metadata to detect such events. CRCs are the prime mechanism for this - that's what XFS uses, and it was XFS reporting a CRC failure when reading torn metadata that started this whole thread. > Is my understanding correct that XFS support neither eMMC nor NVM devices? > (unless there is a battery backup that exceeds the guarantees from the spec) Incorrect. They are supported just fine because flush/FUA semantics provide guarantees against torn writes in normal operation. IOWs, torn writes are something that almost *never* happen in real life, even when power fails suddenly. Despite this, XFS can detect it has occurred (because broken storage is all too common!), and if it can't recovery automatically, it will shut down and ask the user to correct the problem. BTRFS and ZFS can also detect torn writes, and if you use the (non-default) ext4 option "metadata_csum" it will also detect torn writes to metadata via CRC failures. There are other filesystems that can detect and correct torn writes, too. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx