Re: + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 2009-09-03 at 19:05 +0800, Wu Fengguang wrote:
> On Thu, Sep 03, 2009 at 05:48:35PM +0800, Richard Kennedy wrote:
> > On Thu, 2009-09-03 at 10:22 +0800, Wu Fengguang wrote:
> > > On Wed, Sep 02, 2009 at 09:53:31PM +0800, Richard Kennedy wrote:
> > > > On Wed, 2009-09-02 at 12:45 +0200, Peter Zijlstra wrote:
> > > > > On Wed, 2009-09-02 at 17:57 +0800, Wu Fengguang wrote:
> > > > > > On Wed, Sep 02, 2009 at 04:31:40PM +0800, Peter Zijlstra wrote:
> > > > > > > On Sat, 2009-08-22 at 20:11 +0200, Peter Zijlstra wrote:
> > > > > > > > > > +           /* always throttle if over threshold */
> > > > > > > > > > +           if (nr_reclaimable + nr_writeback < dirty_thresh) {
> > > > > > > > > 
> > > > > > > > > That 'if' is a big behavior change. It effectively blocks every one
> > > > > > > > > and canceled Peter's proportional throttling work: the less a process
> > > > > > > > > dirtied, the less it should be throttled.
> > > > > > > > 
> > > > > > > > Hmm, I think you're right, I had not considered that, thanks for
> > > > > > > > catching that.
> > > > > > > 
> > > > > > > So in retrospect I think I might have been wrong here.
> > > > > > > 
> > > > > > > The per task thing causes the bdi limit to be lower than the bdi limit
> > > > > > > based on writeback speed alone. That is, the more a task dirties, the
> > > > > > > lower the bdi limit is as seen for that task.
> > > > > > 
> > > > > > Right. If I understand it right, there will be a safety margin of about
> > > > > > (1/8) * dirty_limit for 1 heavy dirtier case, and that gap scales down
> > > > > > when there are more concurrent heavy dirtiers.
> > > > > 
> > > > > Right, with say 4 heavy writers the gap will be 1/4-th of 1/8-th, which
> > > > > is 1/32-nd.
> > > > > 
> > > > > With the side node that I think 1/8 is too much on large memory systems,
> > > > > and I have posted a sqrt patch numerous times, but I don't think we've
> > > > > ever found out if that helps or not...
> > > > > 
> > > > > > In principle, the ceiling will be a bit higher for a light dirtier to
> > > > > > make it easy to pass in the presence of more heavy dirtiers.
> > > > > 
> > > > > Right.
> > > > > 
> > > > > > > So if we get a task that generates tons of dirty pages (dd) then it
> > > > > > > won't ever actually hit the full dirty limit, even if its the only task
> > > > > > > on the system, and this outer if() will always be true.
> > > > > > 
> > > > > > Right, we have the safety margin :)
> > > > > > 
> > > > > > > Only when we actually saturate the full dirty limit will we fall through
> > > > > > > and throttle, but that is ok -- we want to enforce the full limit.
> > > > > > > 
> > > > > > > In short, a very aggressive dirtier will have a bdi limit lower than the
> > > > > > > total limit (at all times) leaving a little room at the top for the
> > > > > > > occasional dirtier to make quick progress.
> > > > > > > 
> > > > > > > Wu, does that cover the scenario you had in mind?
> > > > > > 
> > > > > > Yes thanks! Please correct me if wrong:
> > > > > > - the lower-ceiling-for-heavier-dirtier algorithm in task_dirty_limit()
> > > > > >   is elegant enough to prevent heavy dirtier to block light ones
> > > > > 
> > > > > ack
> > > > > 
> > > > > > - the test (nr_reclaimable + nr_writeback < dirty_thresh) is not
> > > > > >   relevant in normal, but can be kept for safety in the form of
> > > > > > 
> > > > > >           if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > > > > >               nr_reclaimable + nr_writeback < dirty_thresh)
> > > > > >                   break;
> > > > > 
> > > > > ack
> > > > > 
> > > > > > - clip_bdi_dirty_limit() could be removed: we have been secured by the
> > > > > >   above test
> > > > > 
> > > > > ack.
> > > > 
> > > > 
> > > > I've noticed that there's a difference in the handling of the
> > > > dirty_exceeded flag, because this change no longer clips the bdi_thresh
> > > > then the flag may get cleared more quickly here :-  
> > > > 
> > > > 	if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > > > 	    bdi->dirty_exceeded)
> > > > 		bdi->dirty_exceeded = 0;
> > > > 
> > > > So it then could call balance_dirty_pages a lot less often.
> > > 
> > > I guess in normal situations, clip_bdi_dirty_limit() is simply a
> > > no-op, or just lowers bdi_thresh slightly (otherwise could a bug).
> > > So it could be removed without causing much side effects, including
> > > the influence on dirty_exceeded.
> > > 
> > > > I've got an updated version of this patch that moves the clip_bdi logic
> > > > up into balance_dirty_pages that should be closer to the existing
> > > > behavior & tests so far look good. I can post it for comments if you're
> > > > interested ?
> > > 
> > > So I suggested just remove clip_bdi_dirty_limit(). To be sure, could
> > > run with the following patch and check if big numbers are showed.
> > > 
> > > Thanks,
> > > Fengguang
> > > ---
> > Yes, writing to one disk there's no difference, but what about writing
> > to multiple disks?
> > 
> > Can't we get into the situation where
> > 	nr_reclaimable + nr_writeback >= dirty_threshold
> > and a bdi is
> > 	bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh
> 
> This should also happen very infrequently. For the sake of safety we
> could create some local variable
> 
>         dirty_exceeded = (bdi_nr_reclaimable + bdi_nr_writeback >= bdi_thresh) ||
>                          (nr_reclaimable + nr_writeback >= dirty_threshold);
> 
> and to use it throughout the function, ie.
> 
>         if (!dirty_exceeded)
>                 break;
> and
> 	if (!dirty_exceeded && bdi->dirty_exceeded)
> 		bdi->dirty_exceeded = 0;
> ?
> 
> Thanks,
> Fengguang
yep that sounds good, I'll give it a try 
regards
Richard

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux