https://bugzilla.kernel.org/show_bug.cgi?id=200753 --- Comment #9 from Theodore Tso (tytso@xxxxxxx) --- For writes, userspace must call fsync(2) and check the error returns from fsync(2), write(2), and close(2), if it wants to be sure of catching the error report. (For some remote file systems, such as AFS, the error reporting a quota overflow happens on write, since it's only on the close that the file is sent to the server.) If you don't call fsync(2), the data blocks (or changes to the inode) may not be attempted to be written to disk before the userspace program exits so there is no guarantee that would be any opportunity for the system even *notice* that there is a problem. Also, as far as metadata blocks (such as inode table blocks), what's generally important is whether they are successfully written to the journal. That's because in real life there are two cases where we have errors the *vast* majority of time. (a) The device has disappeared on us, because it's been unplugged from the computer or the last fibre channel connection between the computer and the disk has been lost, etc. (b) There is a media error. For (a) so long as the writes have made it to the journal, that's what is important. If the disk has disappeared, then when it comes back, we will replay the journal, and the inode table updates will be written. For (b), in general with modern storage devices, there is a bad block replacement pool, and writes will use a newly allocated block from the bad block sparing pool if there is a problem with the recording error, and this is transparent to the host software. How you are modelling errors by using a device-mapper target to force-fail certain blocks permanently might reflect how disks behaved in the early 1980's on PC's (e.g., pre-IDE and pre-SATA), but doesn't reflect how storage devices behave today. One could argue that ext4 should do the right thing even when using hardware which is 35+ years old. The problem is, for example, if we forced the disk to actually try to persist writes after each inode update in fsck, we would destroy performance. You can try simulate this by hacking e2fsck to force the use of O_DIRECT reads and writes (which eliminate buffering, so each read and write call results in a synchronous I/O request to the device). You will find that the results are not pretty. Hence, trading off a performance disaster to make some academic who is writing a paper about whether or not file systems handle artificial I/O error injections that do not comport with reality is really not something I'm particularly interested in..... -- You are receiving this mail because: You are watching the assignee of the bug.