On Sun, 2011-09-04 at 09:53 +0800, Wu Fengguang wrote: > plain text document attachment (bdi-reserve-area) > Keep a minimal pool of dirty pages for each bdi, so that the disk IO > queues won't underrun. > > It's particularly useful for JBOD and small memory system. > > Note that this is not enough when memory is really tight (in comparison > to write bandwidth). It may result in (pos_ratio > 1) at the setpoint > and push the dirty pages high. This is more or less intended because the > bdi is in the danger of IO queue underflow. However the global dirty > pages, when pushed close to limit, will eventually conteract our desire > to push up the low bdi_dirty. > > In low memory JBOD tests we do see disks under-utilized from time to > time. The additional fix may be to add a BDI_async_underrun flag to > indicate that the block write queue is running low and it's time to > quickly fill the queue by unthrottling the tasks regardless of the > global limit. > > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > --- > mm/page-writeback.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > --- linux-next.orig/mm/page-writeback.c 2011-08-26 20:12:19.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2011-08-26 20:13:21.000000000 +0800 > @@ -487,6 +487,16 @@ unsigned long bdi_dirty_limit(struct bac > * 0 +------------.------------------.----------------------*-------------> > * freerun^ setpoint^ limit^ dirty pages > * > + * (o) bdi reserve area > + * > + * The bdi reserve area tries to keep a reasonable number of dirty pages for > + * preventing block queue underrun. > + * > + * reserve area, scale up rate as dirty pages drop low > + * |<----------------------------------------------->| > + * |-------------------------------------------------------*-------|---------- > + * 0 bdi setpoint^ ^bdi_thresh So why not call the thing bdi freerun ? > * (o) bdi control lines > * > * The control lines for the global/bdi setpoints both stretch up to @limit. > @@ -634,6 +644,22 @@ static unsigned long bdi_position_ratio( > pos_ratio *= x_intercept - bdi_dirty; > do_div(pos_ratio, x_intercept - bdi_setpoint + 1); > > + /* > + * bdi reserve area, safeguard against dirty pool underrun and disk idle > + * > + * It may push the desired control point of global dirty pages higher > + * than setpoint. It's not necessary in single-bdi case because a > + * minimal pool of @freerun dirty pages will already be guaranteed. > + */ > + x_intercept = min(write_bw, freerun); > + if (bdi_dirty < x_intercept) { So the point of the freerun point is that we never throttle before it, so basically all the below shouldn't be needed at all, right? > + if (bdi_dirty > x_intercept / 8) { > + pos_ratio *= x_intercept; > + do_div(pos_ratio, bdi_dirty); > + } else > + pos_ratio *= 8; > + } > + > return pos_ratio; > } So why not add: if (likely(dirty < freerun)) return 2; at the start of this function and leave it at that? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html