Re: Questions on block drivers, REQ_FLUSH and REQ_FUA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, Alex.

On Wed, May 25, 2011 at 5:54 PM, Alex Bligh <alex@xxxxxxxxxxx> wrote:
> a) If I do not complete a write command, I may avoid writing it to disk
>  indefinitely (despite completing subsequently received FLUSH
>  commands). The only flushes to disk that I am obliged to flush
>  are those that I've actually told the block layer that I have done.

Yes, driver doesn't have any ordering responsibility w.r.t. FLUSH for
writes which it hasn't declared finished yet.

> b) If I receive a flush command, and prior to completing that flush
>  command, I receive subsequent write commands, I may execute
>  (and, if I like, write, to disk) write commands received AFTER that
>  flush command. I presume if the subsequent write commands write to
>  blocks that I am meant to be flushing, I can just forget about
>  the blocks I am meant to be flushing (because they would be
>  overwritten) provided *something* overwritten what was there before.

The first half is correct.  The latter half may be correct if there's
no intervening write but _please_ don't do that.  If there's something
to be optimized there, it should be done in upper layers.  It's
playing with fire.

> If my understanding is correct, then for future readers of the archive
> (perhaps I should put this list in Documentation/ ?) the semantics are
> something like:
>
> 1. Block drivers may handle requests received in any order, and may
>  issue completions in any order, subject only to the rules below.
>
> 2. If a read covering a given block X is received after one or more writes
>  for that block, then irrespective of the order in which the read
>  and write(s) are handled/completed, the read shall return the
>  value written by the immediately preceding write to that block.
>
>  Therefore whilst the following is legal...
>
>       Driver sends                        Driver replies
>
>       WRITE BLOCK 1 = X
>                                           WRITE BLOCK 1 COMPLETED
>       .... time passes ...
>       READ BLOCK 1
>       WRITE BLOCK 1 = Y
>                                           WRITE BLOCK 1 COMPLETED
>                                           READ BLOCK 1 COMPLETED
>
>  ...the read from block 1 should return X and not Y, even if it was
>  handled by the driver after the write.

This is usually synchronized in the upper layer and AFAIK filesystems
don't issue overlapping reads and writes simultaneously (right?) and
in the above case I don't think READ BLOCK 1 returning Y would be
illegal.  There's no ordering constraints between them anyway and
block layer would happily reorder the second write in front of the
read.

> 3. If a flush request is received, then before completing it (and,
>  in the case of a make_request_function driver) before initiating
>  any attached write, the driver MUST have written to non-volatile
>  storage any writes which were COMPLETED prior to the reception
>  of the flush. This does not affect any writes received, but
>  not completed, prior to the flush, nor does it prevent a block driver
>  from completing subsequently issued writes before completion of the
>  flush. IE the flush does not act as a barrier, it merely ensures that
>  on completion of the flush non-volatile storage contains either the
>  blocks written to prior to the flush or blocks written to in commands
>  issued subsequent to the flush, but completed prior to it.
>
> 4. Requests marked FUA should be written to non-volatile storage prior
>  to completion, but impose no restrictions on ordering.

Hmm... For bio drivers, REQ_FLUSH and REQ_FUA are best explained
together.  The followings are legal combinations.

* No write data, REQ_FLUSH - doesn't have any ordering constraint
other than the inherent FLUSH requirement (previously completed WRITEs
should be on the media on FLUSH completion).

* Write data, REQ_FLUSH - FLUSH must be completed before write data is
issued.  ie. write data must not be written to the media before all
previous writes are on the media.

* Write data, REQ_FUA - Write should be completed before FLUSH is
issued - ie. the write data should be on platter along with previously
completed writes on bio completion.

* Write data, REQ_FLUSH | REQ_FUA - Write data must not be written to
the media before all previous writes are on the media && the write
data must be on the media on bio completion.  This is usually
sequenced as FLUSH write FLUSH.

Request based drivers only see REQ_FLUSH w/o write data and the only
rule it has to follow is that all writes it completed prior to
receiving FLUSH must be on the media on completion of FLUSH and being
smart about it might not be a good idea.

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux