On Mon, Sep 12, 2011 at 06:19:38PM +0800, Peter Zijlstra wrote: > On Wed, 2011-09-07 at 20:31 +0800, Wu Fengguang wrote: > > > > + x_intercept = min(write_bw, freerun); > > > > + if (bdi_dirty < x_intercept) { > > > > > > So the point of the freerun point is that we never throttle before it, > > > so basically all the below shouldn't be needed at all, right? > > > > Yes! > > > > > > + if (bdi_dirty > x_intercept / 8) { > > > > + pos_ratio *= x_intercept; > > > > + do_div(pos_ratio, bdi_dirty); > > > > + } else > > > > + pos_ratio *= 8; > > > > + } > > > > + > > > > return pos_ratio; > > > > } > > Does that mean we can remove this whole block? Right, if the bdi freerun concept is proved to work fine. Unfortunately I find it mostly yields lower performance than bdi reserve area. Patch is attached. If you would like me try other patches, I can easily kick off new tests and redo the comparison. Here is the nr_written numbers over various JBOD test cases, the larger, the better: bdi-reserve bdi-freerun diff case --------------------------------------------------------------------------------------- 38375271 31553807 -17.8% JBOD-10HDD-6G/xfs-100dd-1M-16p-5895M-20 30478879 28631491 -6.1% JBOD-10HDD-6G/xfs-10dd-1M-16p-5895M-20 29735407 28871956 -2.9% JBOD-10HDD-6G/xfs-1dd-1M-16p-5895M-20 30850350 28344165 -8.1% JBOD-10HDD-6G/xfs-2dd-1M-16p-5895M-20 17706200 16174684 -8.6% JBOD-10HDD-thresh=100M/xfs-100dd-1M-16p-5895M-100M 23374918 14376942 -38.5% JBOD-10HDD-thresh=100M/xfs-10dd-1M-16p-5895M-100M 20659278 19640375 -4.9% JBOD-10HDD-thresh=100M/xfs-1dd-1M-16p-5895M-100M 22517497 14552321 -35.4% JBOD-10HDD-thresh=100M/xfs-2dd-1M-16p-5895M-100M 68287850 61078553 -10.6% JBOD-10HDD-thresh=2G/xfs-100dd-1M-16p-5895M-2048M 33835247 32018425 -5.4% JBOD-10HDD-thresh=2G/xfs-10dd-1M-16p-5895M-2048M 30187817 29942083 -0.8% JBOD-10HDD-thresh=2G/xfs-1dd-1M-16p-5895M-2048M 30563144 30204022 -1.2% JBOD-10HDD-thresh=2G/xfs-2dd-1M-16p-5895M-2048M 34476862 34645398 +0.5% JBOD-10HDD-thresh=4G/xfs-10dd-1M-16p-5895M-4096M 30326479 30097263 -0.8% JBOD-10HDD-thresh=4G/xfs-1dd-1M-16p-5895M-4096M 30446767 30339683 -0.4% JBOD-10HDD-thresh=4G/xfs-2dd-1M-16p-5895M-4096M 40793956 45936678 +12.6% JBOD-10HDD-thresh=800M/xfs-100dd-1M-16p-5895M-800M 27481305 24867282 -9.5% JBOD-10HDD-thresh=800M/xfs-10dd-1M-16p-5895M-800M 25651257 22507406 -12.3% JBOD-10HDD-thresh=800M/xfs-1dd-1M-16p-5895M-800M 19849350 21298787 +7.3% JBOD-10HDD-thresh=800M/xfs-2dd-1M-16p-5895M-800M raw data by "grep": JBOD-10HDD-6G/xfs-100dd-1M-16p-5895M-20:10-3.1.0-rc4+/vmstat-end:nr_written 38375271 JBOD-10HDD-6G/xfs-10dd-1M-16p-5895M-20:10-3.1.0-rc4+/vmstat-end:nr_written 30478879 JBOD-10HDD-6G/xfs-1dd-1M-16p-5895M-20:10-3.1.0-rc4+/vmstat-end:nr_written 29735407 JBOD-10HDD-6G/xfs-2dd-1M-16p-5895M-20:10-3.1.0-rc4+/vmstat-end:nr_written 30850350 JBOD-10HDD-thresh=100M/xfs-100dd-1M-16p-5895M-100M:10-3.1.0-rc4+/vmstat-end:nr_written 17706200 JBOD-10HDD-thresh=100M/xfs-10dd-1M-16p-5895M-100M:10-3.1.0-rc4+/vmstat-end:nr_written 23374918 JBOD-10HDD-thresh=100M/xfs-1dd-1M-16p-5895M-100M:10-3.1.0-rc4+/vmstat-end:nr_written 20659278 JBOD-10HDD-thresh=100M/xfs-2dd-1M-16p-5895M-100M:10-3.1.0-rc4+/vmstat-end:nr_written 22517497 JBOD-10HDD-thresh=2G/xfs-100dd-1M-16p-5895M-2048M:10-3.1.0-rc4+/vmstat-end:nr_written 68287850 JBOD-10HDD-thresh=2G/xfs-10dd-1M-16p-5895M-2048M:10-3.1.0-rc4+/vmstat-end:nr_written 33835247 JBOD-10HDD-thresh=2G/xfs-1dd-1M-16p-5895M-2048M:10-3.1.0-rc4+/vmstat-end:nr_written 30187817 JBOD-10HDD-thresh=2G/xfs-2dd-1M-16p-5895M-2048M:10-3.1.0-rc4+/vmstat-end:nr_written 30563144 JBOD-10HDD-thresh=4G/xfs-10dd-1M-16p-5895M-4096M:10-3.1.0-rc4+/vmstat-end:nr_written 34476862 JBOD-10HDD-thresh=4G/xfs-1dd-1M-16p-5895M-4096M:10-3.1.0-rc4+/vmstat-end:nr_written 30326479 JBOD-10HDD-thresh=4G/xfs-2dd-1M-16p-5895M-4096M:10-3.1.0-rc4+/vmstat-end:nr_written 30446767 JBOD-10HDD-thresh=800M/xfs-100dd-1M-16p-5895M-800M:10-3.1.0-rc4+/vmstat-end:nr_written 40793956 JBOD-10HDD-thresh=800M/xfs-10dd-1M-16p-5895M-800M:10-3.1.0-rc4+/vmstat-end:nr_written 27481305 JBOD-10HDD-thresh=800M/xfs-1dd-1M-16p-5895M-800M:10-3.1.0-rc4+/vmstat-end:nr_written 25651257 JBOD-10HDD-thresh=800M/xfs-2dd-1M-16p-5895M-800M:10-3.1.0-rc4+/vmstat-end:nr_written 19849350 JBOD-10HDD-6G/xfs-100dd-1M-16p-5895M-20:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 31553807 JBOD-10HDD-6G/xfs-10dd-1M-16p-5895M-20:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 28631491 JBOD-10HDD-6G/xfs-1dd-1M-16p-5895M-20:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 28871956 JBOD-10HDD-6G/xfs-2dd-1M-16p-5895M-20:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 28344165 JBOD-10HDD-thresh=100M/xfs-100dd-1M-16p-5895M-100M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 16174684 JBOD-10HDD-thresh=100M/xfs-10dd-1M-16p-5895M-100M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 14376942 JBOD-10HDD-thresh=100M/xfs-1dd-1M-16p-5895M-100M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 19640375 JBOD-10HDD-thresh=100M/xfs-2dd-1M-16p-5895M-100M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 14552321 JBOD-10HDD-thresh=2G/xfs-100dd-1M-16p-5895M-2048M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 61078553 JBOD-10HDD-thresh=2G/xfs-10dd-1M-16p-5895M-2048M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 32018425 JBOD-10HDD-thresh=2G/xfs-1dd-1M-16p-5895M-2048M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 29942083 JBOD-10HDD-thresh=2G/xfs-2dd-1M-16p-5895M-2048M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 30204022 JBOD-10HDD-thresh=4G/xfs-10dd-1M-16p-5895M-4096M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 34645398 JBOD-10HDD-thresh=4G/xfs-1dd-1M-16p-5895M-4096M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 30097263 JBOD-10HDD-thresh=4G/xfs-2dd-1M-16p-5895M-4096M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 30339683 JBOD-10HDD-thresh=800M/xfs-100dd-1M-16p-5895M-800M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 45936678 JBOD-10HDD-thresh=800M/xfs-10dd-1M-16p-5895M-800M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 24867282 JBOD-10HDD-thresh=800M/xfs-1dd-1M-16p-5895M-800M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 22507406 JBOD-10HDD-thresh=800M/xfs-2dd-1M-16p-5895M-800M:10-3.1.0-rc4-bdi-freerun+/vmstat-end:nr_written 21298787
Subject: Date: Wed Sep 14 22:57:43 CST 2011 Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> --- mm/page-writeback.c | 26 ++++++++------------------ 1 file changed, 8 insertions(+), 18 deletions(-) --- linux-next.orig/mm/page-writeback.c 2011-09-14 22:50:33.000000000 +0800 +++ linux-next/mm/page-writeback.c 2011-09-14 22:58:15.000000000 +0800 @@ -614,22 +614,6 @@ static unsigned long bdi_position_ratio( } else pos_ratio /= 4; - /* - * bdi reserve area, safeguard against dirty pool underrun and disk idle - * - * It may push the desired control point of global dirty pages higher - * than setpoint. It's not necessary in single-bdi case because a - * minimal pool of @freerun dirty pages will already be guaranteed. - */ - x_intercept = min(write_bw, freerun); - if (bdi_dirty < x_intercept) { - if (bdi_dirty > x_intercept / 8) { - pos_ratio *= x_intercept; - do_div(pos_ratio, bdi_dirty); - } else - pos_ratio *= 8; - } - return pos_ratio; } @@ -1089,8 +1073,14 @@ static void balance_dirty_pages(struct a nr_dirty, bdi_thresh, bdi_dirty, start_time); - if (unlikely(!dirty_exceeded && bdi_async_underrun(bdi))) - break; + freerun = min(bdi->avg_write_bandwidth + MIN_WRITEBACK_PAGES, + global_dirty_limit - nr_dirty) / 8; + if (!dirty_exceeded) { + if (unlikely(bdi_dirty < freerun)) + break; + if (unlikely(bdi_async_underrun(bdi))) + break; + } max_pause = bdi_max_pause(bdi, bdi_dirty);