On Wed, 12 Mar 2014 04:00:15 -0700 Christoph Hellwig <hch@xxxxxxxxxxxxx> wrote: > The SLES12 tree has various patches to implement special > O_DIRECT|O_NONBLOCK semantics for block devices: > > https://gitorious.org/opensuse/kernel-source/source/806eab3e4b02e798c1ae942440051f81c822ca35:patches.suse/block-nonblock-causes-failfast > > this seems genuinely useful and I'd be really happy if people would do > this work upstream for two reasons: > > a) implementing different semantics only in a vendor kernel is a > nightmare. No proper way to document it in the man pages for > example, and silent breakage of applications that expect it to be > present, or even more nasty not present. > b) Which brings us to: we had various issues with adding O_NONBLOCK to > files that didn't support it before. How well was this whole feature > tested? This "feature" was really just a hack because a particular customer needed something in a particular situation. At the core of this in my thinking is the 'failfast' BIO flag ... or 'flags' really because there are now three of them. They don't seem to be documented or uniformly supported or used much at all. dm-multipath uses one, and btrfs uses another. There could be value in using one or more or something in md but as they aren't documented and could mean almost anything I have stayed away. I tried adding some sort of 'failfast' support to md once and I would get occasional failures from regular sata devices which otherwise appeared to be working perfectly well. So it seemed that "fast" was altogether *too* fast. For a particular customer with some particular hardware there were issues where that hardware could choose not to respond for extended periods. So we modified the driver to accept a 'timeout' module parameter and to cause REQ_FAILFAST_DEV (I think) requests to fail with -ETIMEDOUT if they could not be serviced in that time. We then modified md to cope with that particular well-defined semantic. And hacked "O_NONBLOCK" support in so that mdadm could access the device without the risk of hanging indefinitely. I would be happy to bring at least some of this functionality into mainline, but I would need a "FAILFAST" flag that actually meant something useful and was sufficiently well documented so that if some driver got it wrong, I would be justified in blaming the driver for not meeting the expectations that I encoded into md. I think that the FAILFAST flag that I need would do some error recovery but would be time limited. Maybe a software TLER (Time Limited Error Recovery). I also think there should probably be just one FAILFAST flag. Where it was the DEV or the TRANSPORT or the DRIVER that failed could be returned in the error code for any caller that cared. But as I don't know why the one became three I could well be missing something important. As for testing, only basic "does it function as expected" testing. Part of the reason for only modifying O_NONBLOCK behaviour where O_DIRECT was also set was to make it extremely unlikely that any code would use this feature except code that specifically needed it. NeilBrown
Attachment:
signature.asc
Description: PGP signature