Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync()

"Theodore Ts'o" <tytso@xxxxxxx> · Thu, 26 Jan 2017 22:23:18 -0500

On Fri, Jan 27, 2017 at 09:19:10AM +1100, NeilBrown wrote:
> I don't think it has.
> The original topic was about gracefully handling of recoverable IO errors.
> The question was framed as about retrying fsync() is it reported an
> error, but this was based on a misunderstand.  fsync() doesn't report
> an error for recoverable errors.  It hangs.
> So the original topic is really about gracefully handling IO operations
> which currently can hang indefinitely.

Well, the problem is that it is up to the device driver to decide when
an error is recoverable or not.  This might include waiting X minutes,
and then deciding that the fibre channel connection isn't coming back,
and then turning it into an unrecoverable error.  Or for other
devices, the timeout might be much smaller.

Which is fine --- I think that's where the decision ought to live, and
if users want to tune a different timeout before the driver stops
waiting, that should be between the system administrator and the
device driver /sys tuning knob.

> >> When combined with O_DIRECT, it effectively means "no retries".  For
> >> block devices and files backed by block devices,
> >> REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT is used and a failure will be
> >> reported as EWOULDBLOCK, unless it is obvious that retrying wouldn't
> >> help.

Absolutely no retries?  Even TCP retries in the case of iSCSI?  I
don't think turning every TCP packet drop into EWOULDBLOCK would make
sense under any circumstances.  What might make sense is to have a
"short timeout" where it's up to the block device to decide what
"short timeout" means.

EWOULDBLOCK is also a little misleading, because even if the I/O
request is submitted immediately to the block device and immediately
serviced and returned, the I/O request would still be "blocking".
Maybe ETIMEDOUT instead?

> And aio_write() isn't non-blocking for O_DIRECT already because .... oh,
> it doesn't even try.  Is there something intrinsically hard about async
> O_DIRECT writes, or is it just that no-one has written acceptable code
> yet?

AIO/DIO writes can indeed be non-blocking, if the file system doesn't
need to do any metadata operations.  So if the file is preallocated,
you should be able to issue an async DIO write without losing the CPU.

> A truly async O_DIRECT aio_write() combined with a working io_cancel()
> would probably be sufficient.  The block layer doesn't provide any way
> to cancel a bio though, so that would need to be wired up.

Kent Overstreet worked up io_cancel for AIO/DIO writes when he was at
Google.  As I recall the patchset did get posted a few times, but it
never ended up getted accepted for upstream adoption.

We even had some very rough code that would propagate the cancellation
request to the hard drive, for those hard drives that had a facility
for accepting a cancellation request for an I/O which was queued via
NCQ but which hadn't executed yet.  It sort-of worked, but it never
hit a state where it could be published before the project was
abandoned.

						- Ted

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>