Re: [PATCH 4/5] bcache: writeback: collapse contiguous IO better

Coly Li <i@xxxxxxx> · Sat, 30 Sep 2017 16:03:55 +0800

On 2017/9/30 下午3:13, Michael Lyle wrote:
> Coly--
> 
> 
> On Fri, Sep 29, 2017 at 11:58 PM, Coly Li <i@xxxxxxx> wrote:
>> On 2017/9/30 上午11:17, Michael Lyle wrote:
>> [snip]
>>
>> If writeback_rate is not minimum value, it means there are front end
>> write requests existing.
> 
> This is wrong.  Else we'd never make it back to the target.
> 

Of cause we can :-) When dirty data is far beyond dirty percent,
writeback rate is quite high, in that case dc->last_read takes effect.
But this is not the situation which your patch handles better.

>> In this case, backend writeback I/O should nice
>> I/O throughput to front end I/O. Otherwise, application will observe
>> increased I/O latency, especially when dirty percentage is not very
>> high. For enterprise workload, this change hurts performance.
> 
> No, utilizing less of disk throughput for a given writeback rate
> improves workload latency.  :P  The maximum that will be aggregated in
> this way is constrained more strongly than the old code...
> 

Could you please show exact latency comparison data ? Imagine that by
code is too hard to me ...

>> An desired behavior for low latency enterprise workload is, when dirty
>> percentage is low, once there is front end I/O, backend writeback should
>> be at minimum rate. This patch will introduce unstable and unpredictable
>> I/O latency.
> 
> Nope.  It lowers disk utilization overall, and the amount of disk
> bandwidth any individual request chunk can use is explicitly bounded
> (unlike before, where it was not).
> 

It will be great if we can see performance data here.

>> Unless there is performance bottleneck of writeback seeking, at least
>> enterprise users will focus more on front end I/O latency ....
> 
> The less writeback seeks the disk, the more the real workload can seek
> this disk!  And if you're at high writeback rates, the vast majority
> of the time the disk is seeking!
> 

You are right. But when writeback rate is high, dc->last_read can solve
the seeking problem as your patch does.

>> This method is helpful only when writeback I/Os is not issued
>> continuously, other wise if they are issued within slice_idle,
>> underlying elevator will reorder or merge the I/Os in larger request.
>> We have a subsystem in place to rate-limit and make sure that they are
> not issued continuously!  If you want to preserve good latency for
> userspace, you want to keep the system in the regime where that
> control system is effective!
> 

Do you mean write I/O cannot be issued within slice_idle, or they are
issued within slice_idle but underlying elevator does not merge the
continuous request ?

>> Hmm, if you move the dirty IO from btree into dirty_io list, then
>> perform I/O, there is risk that once machine is power down during
>> writeback there might be dirty data lost. If you continuously issue
>> dirty I/O and remove them from btree at same time, that means you will
>> introduce more latency to front end I/O...
> 
> No... this doesn't change the consistency properties.  I am just
> saying-- if we have (up to 5 contiguous things that we're going to
> write back), wait till they're all read; plug; dispatch writes ;
> unplug.
> 

I meant the dirty_io list in your future patch, not current 5 bios
patch. Hmm, maybe you can ignore my noise here, I just jumped off the topic.

>> And plug list will be unplugged automatically as default, when context
>> switching happens. If you will performance read I/Os to the btrees, a
>> context switch is probably to happen, then you won't keep a large bio
>> lists ...
> 
> Thankfully we'll have 5 things to fire immediately after each other,
> so we don't need to worry about automatic unplug.
> 

Forgive me, again I meant dirty_io list stuffs, not this patch...

>> IMHO when writeback rate is low, especially when backing hard disk is
>> not bottleneck, group continuous I/Os in bcache code does not help too
>> much for writeback performance. The only benefit is less I/O issued when
>> front end I/O is low or idle, but not most of users care about it,
>> especially enterprise users.
> 
> I disagree!
> 
>>> I believe patch 4 is useful on its own, but I have this and other
>>> pieces of development that depend upon it.
>>
>> Current bcache code works well in most of writeback loads, I just worry
>> that implementing an elevator in bcache writeback logic is a big
>> investment with a little return.
> 
> Bcache already implements a (one-way) elevator!  Bcache invests
> **significant effort** already to do writebacks in LBA order (to
> effectively short-stroke the disk), but because of different arrival
> times of the read request they can end up reordered.  This is bad.
> This is a bug.

Could you please show exact benchmark result ? Then it will be easier to
change my mind. I only know if writeback I/O is not continuously issued
within slice_idle, mostly it is because read dirty delayed in line 238
of read_dirty(),
235	if (KEY_START(&w->key) != dc->last_read ||
236         jiffies_to_msecs(delay) > 50)
237		while (!kthread_should_stop() && delay)
238			delay = schedule_timeout_interruptible(delay);

If writeback rate is high and writeback I/O is continuous, read I/O will
be issued without delay. Then writeback I/O will be issued by
write_dirty() in very short time.

Therefore without a real performance comparison, I cannot image why
adjacent write requests cannot merged in elevator... Unless I do the
test myself.

-- 
Coly Li