Re: IO-less dirty throttling V6 results available

Wu Fengguang <fengguang.wu@xxxxxxxxx> · Fri, 25 Feb 2011 22:44:12 +0800

On Fri, Feb 25, 2011 at 02:56:32AM +0800, Jan Kara wrote:
> On Thu 24-02-11 23:25:09, Wu Fengguang wrote:
> > The bdi base throttle bandwidth is updated based on three class of
> > parameters.
> > 
> > (1) level of dirty pages
> > 
> > We try to avoid updating the base bandwidth whenever possible. The
> > main update criteria are based on the level of dirty pages, when
> > - the dirty pages are nearby the up or low control scope, or
> > - the dirty pages are departing from the global/bdi dirty goals
> > it's time to update the base bandwidth.
> > 
> > Because the dirty pages are fluctuating steadily, we try to avoid
> > disturbing the base bandwidth when the smoothed number of dirty pages
> > is within (write bandwidth / 8) distance to the goal, based on the
> > fact that fluctuations are typically bounded by the write bandwidth.
> > 
> > (2) the position bandwidth
> > 
> > The position bandwidth is equal to the base bandwidth if the dirty
> > number is equal to the dirty goal, and will be scaled up/down when
> > the dirty pages grow larger than or drop below the goal.
> > 
> > When it's decided to update the base bandwidth, the delta between
> > base bandwidth and position bandwidth will be calculated. The delta
> > value will be scaled down at least 8 times, and the smaller delta
> > value, the more it will be shrank. It's then added to the base
> > bandwidth. In this way, the base bandwidth will adapt to the position
> > bandwidth fast when there are large gaps, and remain stable when the
> > gap is small enough. 
> > 
> > The delta is scaled down considerably because the position bandwidth
> > is not very reliable. It fluctuates sharply when the dirty pages hit
> > the up/low limits. And it takes time for the dirty pages to return to
> > the goal even when the base bandwidth has be adjusted to the right
> > value. So if tracking the position bandwidth closely, the base
> > bandwidth could be overshot.
> > 
> > (3) the reference bandwidth
> > 
> > It's the theoretic base bandwidth! I take time to calculate it as a
> > reference value of base bandwidth to eliminate the fast-convergence
> > vs. steady-state-stability dilemma in pure position based control.
> > It would be optimal control if used directly, however the reference
> > bandwidth is not directly used as the base bandwidth because the
> > numbers for calculating it are all fluctuating, and it's not
> > acceptable for the base bandwidth to fluctuate in the plateau state.
> > So the roughly-accurate calculated value is now used as a very useful
> > double limit when updating the base bandwidth.
> > 
> > Now you should be able to understand the information rich
> > balance_dirty_pages-pages.png graph. Here are two nice ones:
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/4G-60%25/btrfs-16dd-1M-8p-3927M-60%-2.6.38-rc6-dt6+-2011-02-24-23-14/balance_dirty_pages-pages.png
> > 
> > http://www.kernel.org/pub/linux/kernel/people/wfg/writeback/dirty-throttling-v6/10HDD-JBOD-6G-6%25/xfs-1dd-1M-16p-5904M-6%25-2.6.38-rc5-dt6+-2011-02-21-20-00/balance_dirty_pages-pages.png
>   Thanks for the update on your patch series :). As you probably noted,
> I've created patches which implement IO-less balance_dirty_pages()
> differently so we have two implementations to compare (which is a good
> thing I believe). The question is how to do the comparison...

Yeah :)

> I have implemented comments, Peter had to my patches and I have finished
> scripts for gathering mm statistics and processing trace output and
> plotting them. Looking at your test scripts I can probably use some
> of your workloads as mine are currently simpler. Currently I have some
> simple dd tests running, I'll run something over NFS, SATA+USB and
> hopefully several SATA drives next week.

The tests are pretty time consuming. It will help to reuse test
scripts for saving time and for ease of comparison.

> The question is how to compare results? Any idea? Obvious metrics are
> overall throughput and fairness for IO bound tasks. But then there are

I guess there will be little difference in throughput, as long as the
iostat output all have 100% disk util and full IO size.

As for faireness, I have the "ls-files" output for comparing the
file size created by each dd task. For example,

wfg ~/bee% cat xfs-4dd-1M-8p-970M-20%-2.6.38-rc6-dt6+-2011-02-25-21-55/ls-files
131 -rw-r--r-- 1 root root 2783969280 Feb 25 21:58 /fs/sda7/zero-1
132 -rw-r--r-- 1 root root 2772434944 Feb 25 21:58 /fs/sda7/zero-2
133 -rw-r--r-- 1 root root 2733637632 Feb 25 21:58 /fs/sda7/zero-3
134 -rw-r--r-- 1 root root 2735734784 Feb 25 21:58 /fs/sda7/zero-4

> more subtle things like how the algorithm behaves for tasks that are not IO
> bound for most of the time (or do less IO). Any good metrics here? More
> things we could compare?

For non IO bound tasks, there are fio job files that do different
dirty rates.  I have not run them though, as the bandwidth based
algorithm obviously assigns higher bandwidth to light dirtiers :)

Thanks,
Fengguang

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>