Re: write-behind has no measurable effect?

NeilBrown <neilb@xxxxxxx> · Tue, 15 Feb 2011 10:41:09 +1100

On Mon, 14 Feb 2011 23:57:54 +0100 Andras Korn <korn@xxxxxxxxxxxxxxxxxxxxxxx>
wrote:

> On Tue, Feb 15, 2011 at 09:50:42AM +1100, NeilBrown wrote:
> 
> > > I experimented a bit with write-mostly and write-behind and found that
> > > write-mostly provides a very significant benefit (see below) but
> > > write-behind seems to have no effect whatsoever.
> > 
> > The use-case where write-behind can be expected to have an effect is when the
> > throughput is low enough to be well within the capacity of all devices, but
> > the latency of the write-behind device is higher than desired.
> > write-behind will allow that high latency to be hidden (as long as the
> > throughput limit is not exceeded).
> > 
> > I suspect your tests did not test for low latency in a low-throughput
> > scenario.
> 
> I thought they did. "High latency" was, in my case, caused by the high seek
> times (compared to the SSD) of the spinning disks. Throughput-wise, they
> certainly could have kept up (their sequential read/write performance even
> exceeds that of the SSD).

A "MB/s" number is not going to show a difference with write-behind as it is
fundamentally about throughput.  We cannot turn random writes into sequential
writes just be doing 'write-behind' as the same locations on disk still have
to be written to.

You need a number like transactions-per-second to see a different.
If you write with O_SYNC, the write-behind will probably show a difference.

> 
> But maybe I misunderstand how write-behind works. I thought/hoped it would
> commit writes to the fast drive(s) and mark affected areas dirty in the
> intent map, then lazily sync the dirty areas over to the slow disk(s).
> 
> What does it actually do? md(4) isn't very forthcoming, and the wiki has no
> relevant hits either.

write-behind makes a copy of the data, submits writes to all devices in
parallel, and reports success to the upper layer as soon as all the
non-write-behind writes have finished.

The approach you suggest could be synthesised by:

 - add a write-intent bitmap with fairly small chunks.  This should be
   an external bitmap and should be directly on the fastest drive
 - have some daemon that fails the 'slow' device, waits 30 seconds, re-adds
   it, waits for recovery to complete, and loops back.

Actually I just realised another reason why you don' see any improvement.
You are using an internal bitmap.  This requires a synch write to both
devices.  The use-case for which write-behind was developed involved an
external bitmap.

Maybe I should disable bitmap updates to write-behind devices .....

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html