Nico Williams, on 10/24/2012 05:17 PM wrote:
Yes, SCSI has full support for ordered/simple commands designed exactly for
that task: [...]
[...]
But historically for some reason Linux storage developers were stuck with
"barriers" concept, which is obviously not the same as ORDERED commands,
hence had a lot troubles with their ambiguous semantic. As far as I can tell
the reason of that was some lack of sufficiently deep SCSI understanding
(how to handle errors, believe that ACA is something legacy from parallel
SCSI times, etc.).
Barriers are a very simple abstraction, so there's that.
It isn't simple at all. If you think for some time about barriers from the storage
point of view, you will soon realize how bad and ambiguous they are.
Before that happens, people will keep returning again and again with those
simple questions: why the queue must be flushed for any ordered operation?
Isn't is an obvious overkill?
That [cache flushing]
It isn't cache flushing, it's _queue_ flushing. You can call it queue draining, if
you like.
Often there's a big difference where it's done: on the system side, or on the
storage side.
Actually, performance improvements from NCQ in many cases are not because it
allows the drive to reorder requests, as it's commonly thought, but because it
allows to have internal drive's processing stages stay always busy without any
idle time. Drives often have a long internal pipeline.. Hence the need to keep
every stage of it always busy and hence why using ORDERED commands is important
for performance.
is not what's being asked for here. Just a
light-weight barrier. My proposal works without having to add new
system calls: a) use a COW format, b) have background threads doing
fsync()s, c) in each transaction's root block note the last
known-committed (from a completed fsync()) transaction's root block,
d) have an array of well-known ubberblocks large enough to accommodate
as many transactions as possible without having to wait for any one
fsync() to complete, d) do not reclaim space from any one past
transaction until at least one subsequent transaction is fully
committed. This obtains ACI- transaction semantics (survives power
failures but without durability for the last N transactions at
power-failure time) without requiring changes to the OS at all, and
with support for delayed D (durability) notification.
I believe what you really want is to be able to send to the storage a sequence of
your favorite operations (FS operations, async IO operations, etc.) like:
Write back caching disabled:
data op11, ..., data op1N, ORDERED data op1, data op21, ..., data op2M, ...
Write back caching enabled:
data op11, ..., data op1N, ORDERED sync cache, ORDERED FUA data op1, data op21,
..., data op2M, ...
Right?
(ORDERED means that it is guaranteed that this ordered command never in any
circumstances will be executed before any previous command completed AND after any
subsequent command completed.)
Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html