On Wed, Sep 02, 2009 at 09:53:31PM +0800, Richard Kennedy wrote: > On Wed, 2009-09-02 at 12:45 +0200, Peter Zijlstra wrote: > > On Wed, 2009-09-02 at 17:57 +0800, Wu Fengguang wrote: > > > On Wed, Sep 02, 2009 at 04:31:40PM +0800, Peter Zijlstra wrote: > > > > On Sat, 2009-08-22 at 20:11 +0200, Peter Zijlstra wrote: > > > > > > > + /* always throttle if over threshold */ > > > > > > > + if (nr_reclaimable + nr_writeback < dirty_thresh) { > > > > > > > > > > > > That 'if' is a big behavior change. It effectively blocks every one > > > > > > and canceled Peter's proportional throttling work: the less a process > > > > > > dirtied, the less it should be throttled. > > > > > > > > > > Hmm, I think you're right, I had not considered that, thanks for > > > > > catching that. > > > > > > > > So in retrospect I think I might have been wrong here. > > > > > > > > The per task thing causes the bdi limit to be lower than the bdi limit > > > > based on writeback speed alone. That is, the more a task dirties, the > > > > lower the bdi limit is as seen for that task. > > > > > > Right. If I understand it right, there will be a safety margin of about > > > (1/8) * dirty_limit for 1 heavy dirtier case, and that gap scales down > > > when there are more concurrent heavy dirtiers. > > > > Right, with say 4 heavy writers the gap will be 1/4-th of 1/8-th, which > > is 1/32-nd. > > > > With the side node that I think 1/8 is too much on large memory systems, > > and I have posted a sqrt patch numerous times, but I don't think we've > > ever found out if that helps or not... > > > > > In principle, the ceiling will be a bit higher for a light dirtier to > > > make it easy to pass in the presence of more heavy dirtiers. > > > > Right. > > > > > > So if we get a task that generates tons of dirty pages (dd) then it > > > > won't ever actually hit the full dirty limit, even if its the only task > > > > on the system, and this outer if() will always be true. > > > > > > Right, we have the safety margin :) > > > > > > > Only when we actually saturate the full dirty limit will we fall through > > > > and throttle, but that is ok -- we want to enforce the full limit. > > > > > > > > In short, a very aggressive dirtier will have a bdi limit lower than the > > > > total limit (at all times) leaving a little room at the top for the > > > > occasional dirtier to make quick progress. > > > > > > > > Wu, does that cover the scenario you had in mind? > > > > > > Yes thanks! Please correct me if wrong: > > > - the lower-ceiling-for-heavier-dirtier algorithm in task_dirty_limit() > > > is elegant enough to prevent heavy dirtier to block light ones > > > > ack > > > > > - the test (nr_reclaimable + nr_writeback < dirty_thresh) is not > > > relevant in normal, but can be kept for safety in the form of > > > > > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && > > > nr_reclaimable + nr_writeback < dirty_thresh) > > > break; > > > > ack > > > > > - clip_bdi_dirty_limit() could be removed: we have been secured by the > > > above test > > > > ack. > > > I've noticed that there's a difference in the handling of the > dirty_exceeded flag, because this change no longer clips the bdi_thresh > then the flag may get cleared more quickly here :- > > if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh && > bdi->dirty_exceeded) > bdi->dirty_exceeded = 0; > > So it then could call balance_dirty_pages a lot less often. I guess in normal situations, clip_bdi_dirty_limit() is simply a no-op, or just lowers bdi_thresh slightly (otherwise could a bug). So it could be removed without causing much side effects, including the influence on dirty_exceeded. > I've got an updated version of this patch that moves the clip_bdi logic > up into balance_dirty_pages that should be closer to the existing > behavior & tests so far look good. I can post it for comments if you're > interested ? So I suggested just remove clip_bdi_dirty_limit(). To be sure, could run with the following patch and check if big numbers are showed. Thanks, Fengguang --- mm/page-writeback.c | 6 ++++++ 1 file changed, 6 insertions(+) --- linux-mm.orig/mm/page-writeback.c 2009-09-02 17:16:51.000000000 +0800 +++ linux-mm/mm/page-writeback.c 2009-09-03 10:19:00.000000000 +0800 @@ -258,6 +258,7 @@ static void clip_bdi_dirty_limit(struct unsigned long dirty, unsigned long *pbdi_dirty) { unsigned long avail_dirty; + unsigned long delta; avail_dirty = global_page_state(NR_FILE_DIRTY) + global_page_state(NR_WRITEBACK) + @@ -272,6 +273,11 @@ static void clip_bdi_dirty_limit(struct avail_dirty += bdi_stat(bdi, BDI_RECLAIMABLE) + bdi_stat(bdi, BDI_WRITEBACK); + delta = *pbdi_dirty - min(*pbdi_dirty, avail_dirty); + delta *= 1024; + delta /= *pbdi_dirty + 1; + printk("clip_bdi_dirty_limit %lu\n", delta); + *pbdi_dirty = min(*pbdi_dirty, avail_dirty); } -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html