Questions on block drivers, REQ_FLUSH and REQ_FUA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I am writing (well, designing at the moment) a block driver where there is
a very high variance in time to write the underlying blocks. I have some
questions about the interface to block drivers which I would be really
grateful if someone could answer. I believe they are not answered by
existing documentation and I am happy to write up and contribute something
for Documentation/ in return for answers.

Q1: This may seem pretty basic, but I presume I am allowed to answer
requests submitted to my block driver in an order other than the order
in which they are submitted. IE can I do this:

       Receive        Reply
       =======        =====
       READ1
       READ2
                      REPLY READ2
                      REPLY READ1

My understanding is "yes". If the answer is "no", I am overcomplicating
things and most of the rest of this is irrelevant.

Q2: Will I ever get a sequence where a I receive a read for a block that
I have already received, but not responded to, a write? If so, I presume
I have to ensure I do not send back "old data", but that reordering is
still acceptable, i.e. this is OK, but different values for replies
are not:

       Receive        Reply
       =======        =====
       WRITE1 blkX=A
       READ1 blkX
       WRITE2 blkX=B
	READ2 blkX
                      REPLY READ2=B
                      REPLY READ1=A
                      REPLY WRITE1
                      REPLY WRITE2

Q3: Apparently there are no longer concepts of barriers, just REQ_FLUSH
and REQ_FUA. REQ_FLUSH guarantees all "completed" I/O requests are written
to disk prior to that BIO starting. However, what about non-completed I/O
requests? For instance, is the following legitimate:

       Receive        Send to disk         Reply
       =======        ============         =====
       WRITE1
       WRITE2
                                           WRITE2 (cached)
       FLUSH+WRITE3
                      WRITE2
                      WRITE3
                                           WRITE3
       WRITE4
                      WRITE4
                                           WRITE4
                      WRITE1
                                           WRITE1

Here WRITE1 was not 'completed', and thus by the text of
Documentation/writeback_cache_control.txt, need not be written to disk
before starting WRITE3 (which had REQ_FLUSH attached).

The REQ_FLUSH flag can be OR ed into the r/w flags of a bio submitted from
the filesystem and will make sure the volatile cache of the storage device
has been flushed before the actual I/O operation is started.  This
explicitly guarantees that previously completed write requests are on
non-volatile storage before the flagged bio starts.

I presume this is illegal and is a documentation issue.

Q4. Can I reorder forwards write requests across flushes? IE, can I do
this:

       Receive        Send to disk         Reply
       =======        ============         =====
       WRITE1
                                           WRITE2 (cached)
       WRITE2
                                           WRITE2 (cached)
       FLUSH+WRITE3
       WRITE4
                      WRITE4
                                           WRITE4
                      WRITE2
                      WRITE3
                                           WRITE3

Again this does not appear to be illegal, as the FLUSH operation is
not defined as a barrier, meaning it should in theory be possible
to handle (and write to disk) requests received after the
FLUSH request before the FLUSH request finishes, provided that the
commands received before the FLUSH request itself complete before
the FLUSH request is replied to. I really don't know what the answer
is to this one. It makes a big difference to me as I can write multiple
blocks in parallel, and would really rather not slow up future write
requests until everything is flushed unless I need to.

Any assistance gratefully received!

--
Alex Bligh

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux