Hi, On 2018-04-12 01:34:45 -0400, Theodore Y. Ts'o wrote: > The solution we use at Google is that we watch for I/O errors using a > completely different process that is responsible for monitoring > machine health. It used to scrape dmesg, but we now arrange to have > I/O errors get sent via a netlink channel to the machine health > monitoring daemon. Any pointers to that the underling netlink mechanism? If we can force postgres to kill itself when such an error is detected (via a dedicated monitoring process), I'd personally be happy enough. It'd be nicer if we could associate that knowledge with particular filesystems etc (which'd possibly hard through dm etc?), but this'd be much better than nothing. > The reality is that recovering from disk errors is tricky business, > and I very much doubt most userspace applications, including distro > package managers, are going to want to engineer for trying to detect > and recover from disk errors. If that were true, then Red Hat and/or > SuSE have kernel engineers, and they would have implemented everything > everything on your wish list. They haven't, and that should tell you > something. The problem really isn't about *recovering* from disk errors. *Knowing* about them is the crucial part. We do not want to give back clients the information that an operation succeeded, when it actually didn't. There could be improvements above that, but as long as it's guaranteed that "we" get the error (rather than just some kernel log we don't have access to, which looks different due to config etc), it's ok. We can throw our hands up in the air and give up. > The other reality is that once a disk starts developing errors, in > reality you will probably need to take the disk off-line, scrub it to > find any other media errors, and there's a good chance you'll need to > rewrite bad sectors (incluing some which are on top of file system > metadata, so you probably will have to run fsck or reformat the whole > file system). I certainly don't think it's realistic to assume adding > lots of sophistication to each and every userspace program. > If you have tens or hundreds of thousands of disk drives, then you > will need to do tsomething automated, but I claim that you really > don't want to smush all of that detailed exception handling and HDD > repair technology into each database or cluster file system component. > It really needs to be done in a separate health-monitor and > machine-level management system. Yea, agreed on all that. I don't think anybody actually involved in postgres wants to do anything like that. Seems far outside of postgres' remit. Greetings, Andres Freund