https://bugzilla.kernel.org/show_bug.cgi?id=200753 --- Comment #11 from Theodore Tso (tytso@xxxxxxx) --- On Tue, Aug 07, 2018 at 03:36:25AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > Now, I probably would expect to see some errors in dmesg if for example inode > flushing fails at unmount time, though. It's not strictly a bug to not log > an > error at the filesystem level, but it's probably desirable. Ted can probably > speak to this better than I can. In practice we do, but it's coming from lower levels of the storage stack. We'll of course log errors writing into the journal, but once they metadata updates are logged to the journal, they get written back by the buffer cache writeback functions. Immediately before the unmount they will be flushed out by the jbd2 layer in fs/jbd2/checkpoint.c, using fs/buffer.c's write_dirty_buffer() with a REQ_SYNC flag, and I/O errors get logged via buffer_io_error(). In practice, the media write errors are logged by the device drivers, so the users have some idea that Bad Stuff is happening --- assuming they even look at the dmesg layer at all, of course. One could imagine an enhancement to the file system to teach it to not use the generic buffer cache writeback functions, and instead submit I/O requests with custom callback functions for metadata I/O so in case of an error, there would be an ext4-level error that would explain that writeback to inode table block XXX, affecting inodes YYY-ZZZ failed. And if someone submitted such a patch, I'd consider it for inclusion, assuming the patch was clean, correct, and didn't introduce long-term maintenance burdens. However, at $WORK, we have a custom set of changes so that all file system errors as well as media errors from the SCSI/SATA layer get sent to a userspace daemon via netlink, and as necessary, the disk will be automatically sent to a repair workflow. The repair workflow would then tell the cluster file system to stop using that disk, and then confirm that the bad block rediction pool was able to kick in correctly, or flagging the drive to be sent to a hardware operations team to replace the disk drive, etc. (The main reason why we haven't sent it upstream is that the patch as it stands today is a bit of an ugly kludge, and would have to be rewritten to be a better structured error kernel->userspace reporting mechanism --- either for the storage stack in general, or for the whole kernel. Alas, no one has had the time or energy to deal with the almost certain bike-shedding party that would ensue after proposing such a new feature. :-) So I don't have much motivation to fix up something to log explanatory error messages from the file system level, when the device driver errors in dmesg are in practice quite sufficient for most users. Even without a custom netlink patch, you can just scrape dmesg. An example of such a userspace approach can be found here: https://github.com/kubernetes/node-problem-detector In practice, most practical systems don't need to know exactly what file system metadata I/O block had problems. We just need to know when the disk drive has started developing errors, and when it has, whether we can restore the disk drive to be a 100% functional storage device, so it can be safely and sanely used by the file system layer --- or whether it's time to replace it with a working drive. I do agree with Eric that it would be a nice-to-have if ext4 were to log messages if an inode table block or a bitmap allocation block ran into errors when being flushed out, at unmount or by the kernel's writeback threads. But it's really only a nice-to-have. Patches gratefully accepted.... -- You are receiving this mail because: You are watching the assignee of the bug.