This patch adds Documentation/request-based-drivers-ordering.txt which is an attempt at explaining the ordering requirements for writing request based drivers, as explained to me by Christoph Hellwig and Jan Kara. Obviously I only have this second hand, so please check for accuracy. I'm hoping this file will be useful to anyone writing a request based driver. I think much of it applies to bio based drivers too, but I don't know enough about them to comment. Signed-off-by: Alex Bligh <alex@xxxxxxxxxxx> --- Documentation/block/00-INDEX | 2 + .../block/request-based-drvers-ordering.txt | 72 ++++++++++++++++++++ 2 files changed, 74 insertions(+), 0 deletions(-) diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX index d111e3b..ea60bc5 100644 --- a/Documentation/block/00-INDEX +++ b/Documentation/block/00-INDEX @@ -10,6 +10,8 @@ ioprio.txt - Block io priorities (in CFQ scheduler) request.txt - The members of struct request (in include/linux/blkdev.h) +request-based-drvers-ordering.txt + - Description of ordering requirements for request based drivers stat.txt - Block layer statistics in /sys/block/<dev>/stat switching-sched.txt diff --git a/Documentation/block/request-based-drvers-ordering.txt b/Documentation/block/request-based-drvers-ordering.txt new file mode 100644 index 0000000..77a6888 --- /dev/null +++ b/Documentation/block/request-based-drvers-ordering.txt @@ -0,0 +1,72 @@ +Ordering requirements for request-based drivers +=============================================== + +Request based drivers may handle requests in any order. This means: + +a) the requests may be handled (i.e. read/written from the device + in any order) + +b) the completions may be done in any order (which may be different from + the order in which the requests were submitted and/or from + the order the requests were handled). + +This is subject ONLY to the restriction that a REQ_FLUSH command +may not be completed until all writes that were completed prior +to the REQ_FLUSH being received have been written to disk. It is +not strictly necessary to ensure that writes received after +the REQ_FLUSH but completed prior to it are also written to +disk, but this is advisable. Note that request based drivers +will only receive a REQ_FLUSH on an empty request (i.e. with +no associated data). + +REQ_FUA has no ordering implications. + +This has the following results: + +1. If a write is submitted to a driver covering block X, and + a second write is submitted also covering block X prior to + completion of the first write, the driver may disorder the + writes, meaning that the value of block X on disk + cannot be determined. + +2. If a read is submitted to a driver covering block X, and + a write is submitted also covering block X prior to the + completion of the read, the value of block X in the read + cannot be determined. + +3. If a write is submitted to a driver covering block X, and + a read is submitted also covering block X prior to the + completion of the write, the value of block X in the read + cannot be determined. + +Neither REQ_FLUSH nor REQ_FUA represent a barrier. Thus note the +following: + +1. A write issued after a REQ_FLUSH may be written to disk + before the REQ_FLUSH is completed, irrespective of whether + the completion is received prior to or after the + completion of the REQ_FLUSH. + +2. A write issued before a REQ_FLUSH may not be written to + disk if it is not completed before the REQ_FLUSH is + completed (though drivers should avoid this behaviour). + +3. A write with REQ_FUA issued subsequent to a REQ_FLUSH + may be performed prior to the flush going to disk. + +4. Writes with REQ_FUA set may be disordered just as any + other writes can be disordered. + +You should not rely on the kernel sending your driver REQ_FLUSH. +Specifically: + +1. You will not receive a REQ_FLUSH during filing system operation + unless the filing system itself supports it. + +2. There is no guarantee that you will receive a REQ_FLUSH on a umount() + of a device. + +3. There is no guarantee you will receive a REQ_FLUSH when your device + ceases to be used, is unplugged, etc. - it is up to you to + ensure data is appropriately flushed. + -- 1.7.4.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html