https://bugzilla.kernel.org/show_bug.cgi?id=200753 --- Comment #15 from Shehbaz (shehbazjaffer007@xxxxxxxxx) --- Thanks Theodore and Eric, > If you don't call fsync(2), the data blocks (or changes to the inode) may not > be attempted to be written to disk before the userspace program exits so > there is no guarantee that would be any opportunity for the system even > *notice* that there is a problem. I have changed my programs to call ftruncate(2), chmod(2), chown(2), creat(2) and open(2),fsync(2), close(2). For each of the programs, I still notice fsync(2) and close(2) return success (strace o/p attached), even after returning I/O error to the file system. Since symlink(2) and utimes(2) do not work on fds but paths, I did not experiment with these. > Also, as far as metadata blocks (such as inode table blocks), what's > generally important is whether they are successfully written to the journal. I agree. I observe the following: Once the writes are done to the journal, there is a follow-up write then to the in-place location of the inode. When this inode write is failed, no error is reported. I assume that the journal also does not bother since the write has already been made in-place on the file system. This is why after remount even though write was done on journal, the updated inode image is not seen on remount. Thank you for explaining other scenarios and real world use-cases, I would take these into consideration for extending this work. > For example a buffered write failure won't be seen by the application unless > it i.e. calls fsync on the write. I thought unmount of file system would automatically ensure any left over blocks in buffer cache would be flushed to disk. However, after your feedback I make sure fsync(2), close(2) and open(2) are done at right locations from the userspace program. > Now, I probably would expect to see some errors in dmesg if for example inode > flushing fails at unmount time, though. I agree. > It looks like none of your operations make any data persistence calls, so > you're not /asking/ for a return of success or failure of persistence. In my current truncate.c, creat.c, chmod.c and chown.c, I ask for data persistence. I still do not see any observable user/kernel space warnings or errors. > We'll of course log errors writing into the journal I have some different observations regarding this, I observe that an EIO during journal write I/O also do not log any errors in kernel space. This does not cause the operation to fail (unlike failure of inode blocks) because I observe that after the write has been done to journal, the writes are also done to in-place location of the inode. So, if I were to fail a journal block write for truncate operation and remount the file system, I see the truncated file, as the inplace write happened successfully. I am sorry I am unable to determine which data structure - journal super, journal descriptor, journal commit or journal data - lies on the failed block, but I can get back to you regarding this in a separate report. Also, thank you for providing background of real disk behaviour. My only concern is if disk vendors only reply with EIO and depend on upper layers to make sense of it, and file systems rely on disk vendors to log errors, we might end up having silent corruptions. 2 log messages would be ok. no log messages would be bad. Please bear with me while I file my findings and I will follow up with patch recommendations. Thank you, -- You are receiving this mail because: You are watching the assignee of the bug.