Actually-- I give up. You've initially bounced every single one of my patches, even the ones that fix crash & data corruption bugs. I spend 10x as much time fighting with you as writing stuff for bcache, and basically every time it's turned out that you're wrong. I will go do something else where it is just not so much work to get something done. Mike On Sat, Sep 30, 2017 at 12:13 AM, Michael Lyle <mlyle@xxxxxxxx> wrote: > Coly-- > > > On Fri, Sep 29, 2017 at 11:58 PM, Coly Li <i@xxxxxxx> wrote: >> On 2017/9/30 上午11:17, Michael Lyle wrote: >> [snip] >> >> If writeback_rate is not minimum value, it means there are front end >> write requests existing. > > This is wrong. Else we'd never make it back to the target. > >> In this case, backend writeback I/O should nice >> I/O throughput to front end I/O. Otherwise, application will observe >> increased I/O latency, especially when dirty percentage is not very >> high. For enterprise workload, this change hurts performance. > > No, utilizing less of disk throughput for a given writeback rate > improves workload latency. :P The maximum that will be aggregated in > this way is constrained more strongly than the old code... > >> An desired behavior for low latency enterprise workload is, when dirty >> percentage is low, once there is front end I/O, backend writeback should >> be at minimum rate. This patch will introduce unstable and unpredictable >> I/O latency. > > Nope. It lowers disk utilization overall, and the amount of disk > bandwidth any individual request chunk can use is explicitly bounded > (unlike before, where it was not). > >> Unless there is performance bottleneck of writeback seeking, at least >> enterprise users will focus more on front end I/O latency .... > > The less writeback seeks the disk, the more the real workload can seek > this disk! And if you're at high writeback rates, the vast majority > of the time the disk is seeking! > >> This method is helpful only when writeback I/Os is not issued >> continuously, other wise if they are issued within slice_idle, >> underlying elevator will reorder or merge the I/Os in larger request. > > We have a subsystem in place to rate-limit and make sure that they are > not issued continuously! If you want to preserve good latency for > userspace, you want to keep the system in the regime where that > control system is effective! > >> Hmm, if you move the dirty IO from btree into dirty_io list, then >> perform I/O, there is risk that once machine is power down during >> writeback there might be dirty data lost. If you continuously issue >> dirty I/O and remove them from btree at same time, that means you will >> introduce more latency to front end I/O... > > No... this doesn't change the consistency properties. I am just > saying-- if we have (up to 5 contiguous things that we're going to > write back), wait till they're all read; plug; dispatch writes ; > unplug. > >> And plug list will be unplugged automatically as default, when context >> switching happens. If you will performance read I/Os to the btrees, a >> context switch is probably to happen, then you won't keep a large bio >> lists ... > > Thankfully we'll have 5 things to fire immediately after each other, > so we don't need to worry about automatic unplug. > >> IMHO when writeback rate is low, especially when backing hard disk is >> not bottleneck, group continuous I/Os in bcache code does not help too >> much for writeback performance. The only benefit is less I/O issued when >> front end I/O is low or idle, but not most of users care about it, >> especially enterprise users. > > I disagree! > >>> I believe patch 4 is useful on its own, but I have this and other >>> pieces of development that depend upon it. >> >> Current bcache code works well in most of writeback loads, I just worry >> that implementing an elevator in bcache writeback logic is a big >> investment with a little return. > > Bcache already implements a (one-way) elevator! Bcache invests > **significant effort** already to do writebacks in LBA order (to > effectively short-stroke the disk), but because of different arrival > times of the read request they can end up reordered. This is bad. > This is a bug. > > Mike > >> >> -- >> Coly Li >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html