Informing kernel that a block device is offlined

Alvin Abitria <abitria.alvin@xxxxxxxxx> · Wed, 11 Nov 2015 11:01:57 +0800

Hello,

Good day.  My inquiry is about block drivers.  Suppose our device
encounter problems and all error handling failed.  The only remaining
option I see for the driver to do is to offline it.  The intention is
for the system to be notified to avoid sending any future request to
the device and be inaccessible so any wicked behavior, like system
crash, are avoided.  This should work whether the device is used
standalone or with RAID.

The question is how to do it cleanly.  I've searched existing drivers
in the kernel source (like sd, skd, etc), and all they do (as I see
it) is to complete succeeding requests with -EIO
(__blk_end_request_all(rq, -EIO)).  So far this works with FIO,
because once that happens and all fio requests are completed with
-EIO, fio stops sending further requests.  But if used with RAID and
unrecoverable fault is triggered, just completing requests with error
seems not to be working.  RAID detects errors, but can't remove the
erring device from its system.  Sometimes kernel is still sending
requests to the RAID'ed faulty device (which should stop).  RAID
becomes degraded (in a not-so-good way) and eventually crashes the
system.

So how does the block driver properly informs the upper layers that
its device is faulty and must not be used again for that session?

Best regards,
Alvin

_______________________________________________
Kernelnewbies mailing list
Kernelnewbies@xxxxxxxxxxxxxxxxx
http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies