Tejun,
--On 25 May 2011 10:59:50 +0200 Tejun Heo <tj@xxxxxxxxxx> wrote:
Yeap, that's correct. Ordering between flush and other writes are now
completely the responsibility of filesystems. Block layer just
doesn't care.
...
A FLUSH command means "flush out all data from writes upto this
point". If a driver has indicated completion of a write and then
received a FLUSH, the data from the write should be written to disk.
So to be clear
a) If I do not complete a write command, I may avoid writing it to disk
indefinitely (despite completing subsequently received FLUSH
commands). The only flushes to disk that I am obliged to flush
are those that I've actually told the block layer that I have done.
b) If I receive a flush command, and prior to completing that flush
command, I receive subsequent write commands, I may execute
(and, if I like, write, to disk) write commands received AFTER that
flush command. I presume if the subsequent write commands write to
blocks that I am meant to be flushing, I can just forget about
the blocks I am meant to be flushing (because they would be
overwritten) provided *something* overwritten what was there before.
If my understanding is correct, then for future readers of the archive
(perhaps I should put this list in Documentation/ ?) the semantics are
something like:
1. Block drivers may handle requests received in any order, and may
issue completions in any order, subject only to the rules below.
2. If a read covering a given block X is received after one or more writes
for that block, then irrespective of the order in which the read
and write(s) are handled/completed, the read shall return the
value written by the immediately preceding write to that block.
Therefore whilst the following is legal...
Driver sends Driver replies
WRITE BLOCK 1 = X
WRITE BLOCK 1 COMPLETED
.... time passes ...
READ BLOCK 1
WRITE BLOCK 1 = Y
WRITE BLOCK 1 COMPLETED
READ BLOCK 1 COMPLETED
...the read from block 1 should return X and not Y, even if it was
handled by the driver after the write.
3. If a flush request is received, then before completing it (and,
in the case of a make_request_function driver) before initiating
any attached write, the driver MUST have written to non-volatile
storage any writes which were COMPLETED prior to the reception
of the flush. This does not affect any writes received, but
not completed, prior to the flush, nor does it prevent a block driver
from completing subsequently issued writes before completion of the
flush. IE the flush does not act as a barrier, it merely ensures that
on completion of the flush non-volatile storage contains either the
blocks written to prior to the flush or blocks written to in commands
issued subsequent to the flush, but completed prior to it.
4. Requests marked FUA should be written to non-volatile storage prior
to completion, but impose no restrictions on ordering.
--
Alex Bligh
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html