Hi all, A code audit demonstrated that many xfsprogs utilities do not check that the buffers they write actually make it to disk. While the userspace buffer cache has a means to check that a buffer was written (which is to reread the buffer after the write), most utilities mark a buffer dirty and release it to the MRU and do not re-check the buffer. Worse yet, the MRU will retain the buffers for all failed writes until the buffer cache is torn down, but it in turn has no way to communicate that writes were lost due to IO errors. libxfs will flush the device when it is unmounted, but as there is no return value, we again fail to notice that writes have been lost. Most likely this leads to a corrupt filesystem, which makes it all the more surprising that xfs_repair can lose writes yet still return 0! Fix all this by making delwri_submit a synchronous write operation like its kernel counterpart; teaching the buffer cache to mark the buftarg when it knows it's losing writes; teaching the device flush functions to return error codes; and adding a new "flush filesystem" API that user programs can call to check for lost writes or IO errors. Then teach all the userspace programs to flush the fs at exit and report errors. In v2 we split up some of the patches and make sure we always fsync when flushing a block device. In v3 we move the buffer and disk flushing requests into the libxfs unmount function. v4 added some extra messages when repair writes fail. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=buffer-write-fixes