I am writing (well, designing at the moment) a block driver where there is a very high variance in time to write the underlying blocks. I have some questions about the interface to block drivers which I would be really grateful if someone could answer. I believe they are not answered by existing documentation and I am happy to write up and contribute something for Documentation/ in return for answers. Q1: This may seem pretty basic, but I presume I am allowed to answer requests submitted to my block driver in an order other than the order in which they are submitted. IE can I do this: Receive Reply ======= ===== READ1 READ2 REPLY READ2 REPLY READ1 My understanding is "yes". If the answer is "no", I am overcomplicating things and most of the rest of this is irrelevant. Q2: Will I ever get a sequence where a I receive a read for a block that I have already received, but not responded to, a write? If so, I presume I have to ensure I do not send back "old data", but that reordering is still acceptable, i.e. this is OK, but different values for replies are not: Receive Reply ======= ===== WRITE1 blkX=A READ1 blkX WRITE2 blkX=B READ2 blkX REPLY READ2=B REPLY READ1=A REPLY WRITE1 REPLY WRITE2 Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are written to disk prior to that BIO starting. However, what about non-completed I/O requests? For instance, is the following legitimate: Receive Send to disk Reply ======= ============ ===== WRITE1 WRITE2 WRITE2 (cached) FLUSH+WRITE3 WRITE2 WRITE3 WRITE3 WRITE4 WRITE4 WRITE4 WRITE1 WRITE1 Here WRITE1 was not 'completed', and thus by the text of Documentation/writeback_cache_control.txt, need not be written to disk before starting WRITE3 (which had REQ_FLUSH attached).
The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from the filesystem and will make sure the volatile cache of the storage device has been flushed before the actual I/O operation is started. This explicitly guarantees that previously completed write requests are on non-volatile storage before the flagged bio starts.
I presume this is illegal and is a documentation issue. Q4. Can I reorder forwards write requests across flushes? IE, can I do this: Receive Send to disk Reply ======= ============ ===== WRITE1 WRITE2 (cached) WRITE2 WRITE2 (cached) FLUSH+WRITE3 WRITE4 WRITE4 WRITE4 WRITE2 WRITE3 WRITE3 Again this does not appear to be illegal, as the FLUSH operation is not defined as a barrier, meaning it should in theory be possible to handle (and write to disk) requests received after the FLUSH request before the FLUSH request finishes, provided that the commands received before the FLUSH request itself complete before the FLUSH request is replied to. I really don't know what the answer is to this one. It makes a big difference to me as I can write multiple blocks in parallel, and would really rather not slow up future write requests until everything is flushed unless I need to. Any assistance gratefully received! -- Alex Bligh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html