Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

Michael Lyle <mlyle@xxxxxxxx> · Thu, 5 Oct 2017 15:59:26 -0700

I think one of the problems here is that there is no consistent
requirements or figure of merit for performance.

You've argued against changes because of A) perceived impact to
front-end I/O latency, B) writeback rate at idle, C) peak
instantaneous writeback rate, D) time to writeback the entire amount.
These are all, to some extent, contradictory.

I was trying to make changes in writeback to improve **efficiency**--
the amount of writeback that is done per the amount of time the
backing device is tied up.  We could have added tunables for A/B/C to
pick any operating along there.  In addition, the changes I was trying
to make were **on the path to further improvement**-- submitting
sequential I/O together with plugging is known to improve merge.

Users on #bcache IRC on oftc.net constantly talk about writeback
performance and writeback use cases-- it's probably the most common
topic of discussion about bcache.  These changes would have improved
that and laid the groundwork for further improvement.

I've retired, but a year ago I was running a bcache environment with
about 40 computers running development environments  and continuous
integration test rigs; we would provision and tear down hundreds of
VMs per day; machines had 256-512GB of RAM, 1TB SATA RAID1 cache
volumes and RAID10 sets 6-8 drives wide.  The use case was enterprise
DB-- so I think I know a thing or two about enterprise use cases ;)

Writeback did obviously bad things in that environment, which is why I
wanted to fix it.  I also see obviously bad behavior on my small home
test rig of RAID1 480GB NVMe + RAID1 6TB disks.

Mike