Coly-- On Fri, Sep 29, 2017 at 11:58 PM, Coly Li <i@xxxxxxx> wrote: > On 2017/9/30 上午11:17, Michael Lyle wrote: > [snip] > > If writeback_rate is not minimum value, it means there are front end > write requests existing. This is wrong. Else we'd never make it back to the target. > In this case, backend writeback I/O should nice > I/O throughput to front end I/O. Otherwise, application will observe > increased I/O latency, especially when dirty percentage is not very > high. For enterprise workload, this change hurts performance. No, utilizing less of disk throughput for a given writeback rate improves workload latency. :P The maximum that will be aggregated in this way is constrained more strongly than the old code... > An desired behavior for low latency enterprise workload is, when dirty > percentage is low, once there is front end I/O, backend writeback should > be at minimum rate. This patch will introduce unstable and unpredictable > I/O latency. Nope. It lowers disk utilization overall, and the amount of disk bandwidth any individual request chunk can use is explicitly bounded (unlike before, where it was not). > Unless there is performance bottleneck of writeback seeking, at least > enterprise users will focus more on front end I/O latency .... The less writeback seeks the disk, the more the real workload can seek this disk! And if you're at high writeback rates, the vast majority of the time the disk is seeking! > This method is helpful only when writeback I/Os is not issued > continuously, other wise if they are issued within slice_idle, > underlying elevator will reorder or merge the I/Os in larger request. We have a subsystem in place to rate-limit and make sure that they are not issued continuously! If you want to preserve good latency for userspace, you want to keep the system in the regime where that control system is effective! > Hmm, if you move the dirty IO from btree into dirty_io list, then > perform I/O, there is risk that once machine is power down during > writeback there might be dirty data lost. If you continuously issue > dirty I/O and remove them from btree at same time, that means you will > introduce more latency to front end I/O... No... this doesn't change the consistency properties. I am just saying-- if we have (up to 5 contiguous things that we're going to write back), wait till they're all read; plug; dispatch writes ; unplug. > And plug list will be unplugged automatically as default, when context > switching happens. If you will performance read I/Os to the btrees, a > context switch is probably to happen, then you won't keep a large bio > lists ... Thankfully we'll have 5 things to fire immediately after each other, so we don't need to worry about automatic unplug. > IMHO when writeback rate is low, especially when backing hard disk is > not bottleneck, group continuous I/Os in bcache code does not help too > much for writeback performance. The only benefit is less I/O issued when > front end I/O is low or idle, but not most of users care about it, > especially enterprise users. I disagree! >> I believe patch 4 is useful on its own, but I have this and other >> pieces of development that depend upon it. > > Current bcache code works well in most of writeback loads, I just worry > that implementing an elevator in bcache writeback logic is a big > investment with a little return. Bcache already implements a (one-way) elevator! Bcache invests **significant effort** already to do writebacks in LBA order (to effectively short-stroke the disk), but because of different arrival times of the read request they can end up reordered. This is bad. This is a bug. Mike > > -- > Coly Li > -- > To unsubscribe from this list: send the line "unsubscribe linux-bcache" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-bcache" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html