On Thu, Apr 12, 2018 at 12:55:36PM -0700, Andres Freund wrote: > > Any pointers to that the underling netlink mechanism? If we can force > postgres to kill itself when such an error is detected (via a dedicated > monitoring process), I'd personally be happy enough. It'd be nicer if > we could associate that knowledge with particular filesystems etc > (which'd possibly hard through dm etc?), but this'd be much better than > nothing. Yeah, sorry, it never got upstreamed. It's not really all that complicated, it was just that there were some other folks who wanted to do something similar, and there was a round of bike-sheddingh several years ago, and nothing ever went upstream. Part of the problem was that our orignial scheme sent up information about file system-level corruption reports --- e.g, those stemming from calls to ext4_error() --- and lots of people had different ideas about how tot get all of the possible information up in some structured format. (Think something like uerf from Digtial's OSF/1.) We did something *really* simple/stupid. We just sent essentially an ascii test string out the netlink socket. That's because what we were doing before was essentially scraping the output of dmesg (e.g. /dev/kmssg). That's actually probably the simplest thing to do, and it has the advantage that it will work even on ancient enterprise kernels that PG users are likely to want to use. So you will need to implement the dmesg text scraper anyway, and that's probably good enough for most use cases. > The problem really isn't about *recovering* from disk errors. *Knowing* > about them is the crucial part. We do not want to give back clients the > information that an operation succeeded, when it actually didn't. There > could be improvements above that, but as long as it's guaranteed that > "we" get the error (rather than just some kernel log we don't have > access to, which looks different due to config etc), it's ok. We can > throw our hands up in the air and give up. Right, it's a little challenging because the actual regexp's you would need to use do vary from device driver to device driver. Fortunately nearly everything is a SCSI/SATA device these days, so there isn't _that_ much variability. > Yea, agreed on all that. I don't think anybody actually involved in > postgres wants to do anything like that. Seems far outside of postgres' > remit. Some people on the pg-hackers list were talking about wanting to retry the fsync() and hoping that would cause the write to somehow suceed. It's *possible* that might help, but it's not likely to be helpful in my experience. Cheers, - Ted