Re: + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 23, 2009 at 02:11:41AM +0800, Peter Zijlstra wrote:
> On Sat, 2009-08-22 at 10:51 +0800, Wu Fengguang wrote:
> 
> > > +++ a/mm/page-writeback.c
> 
> > > @@ -465,7 +439,6 @@ get_dirty_limits(unsigned long *pbackgro
> > >  			bdi_dirty = dirty * bdi->max_ratio / 100;
> > >  
> > >  		*pbdi_dirty = bdi_dirty;
> > >  		task_dirty_limit(current, pbdi_dirty);
> > >  	}
> > >  }
> > > @@ -499,45 +472,12 @@ static void balance_dirty_pages(struct a
> > >  		};
> > >  
> > >  		get_dirty_limits(&background_thresh, &dirty_thresh,
> > > +				 &bdi_thresh, bdi);
> > >  
> > >  		nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> > > +			global_page_state(NR_UNSTABLE_NFS);
> > > +		nr_writeback = global_page_state(NR_WRITEBACK) +
> > > +			global_page_state(NR_WRITEBACK_TEMP);
> > >  
> > >  		/*
> > >  		 * In order to avoid the stacked BDI deadlock we need
> > > @@ -557,16 +497,48 @@ static void balance_dirty_pages(struct a
> > >  			bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> > >  		}
> > >  
> > 
> > > +		/* always throttle if over threshold */
> > > +		if (nr_reclaimable + nr_writeback < dirty_thresh) {
> > 
> > That 'if' is a big behavior change. It effectively blocks every one
> > and canceled Peter's proportional throttling work: the less a process
> > dirtied, the less it should be throttled.
> 
> Hmm, I think you're right, I had not considered that, thanks for
> catching that.

Thank you :)

> > I'd propose to remove the above 'if' and liberate the following three 'if's.
> 
> That might work, but it looses the total dirty_thresh constraint. The
> sum of per-bdi dirties _should_ not be larger than that, but I'm not
> sure it won't ever be.
> 
> The clip code Richard removed ensured that, and I think I wrote that out
> of more than sheer paranoia, but I'm not sure anymore :/

Oh I assumed that your per-bdi throttling is not too permissive to
exceed the global dirty_thresh. In theory the per-bdi throttling
should be able to quickly stop the growing of (nr_reclaimable +
nr_writeback).  Once dirty_thresh is reached we already lose. 

> I'll go over the numeric stuff again to see where it could go wrong.
> 
> If we can bound the error (I'm suspecting it was some numerical error
> bound) we should be good and can indeed do this.

Yes, that error bound should be smaller than (dirty_thresh -
background_thresh) / 2, unless the user set the two thresholds
insanely close (for that we may add some sanity checks in
dirty_bytes_handler() and dirty_background_bytes_handler() etc.).

Anyway we may do some thing like this for now?

                if (dirty_thresh exceeded) {
                        WARN_ONCE
                        block write more
                }

Thanks,
Fengguang

> > > +
> > > +			if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > > +				break;
> > > +
> > > +			/*
> > > +			 * Throttle it only when the background writeback cannot
> > > +			 * catch-up. This avoids (excessively) small writeouts
> > > +			 * when the bdi limits are ramping up.
> > > +			 */
> > > +			if (nr_reclaimable + nr_writeback <
> > > +			    (background_thresh + dirty_thresh) / 2)
> > > +				break;
> > > +
> > > +			/* done enough? */
> > > +			if (pages_written >= write_chunk)
> > > +				break;
> > > +		}
> > > +		if (!bdi->dirty_exceeded)
> > > +			bdi->dirty_exceeded = 1;
> > >  
> > > +		/* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
> > > +		 * Unstable writes are a feature of certain networked
> > > +		 * filesystems (i.e. NFS) in which data may have been
> > > +		 * written to the server's write cache, but has not yet
> > > +		 * been flushed to permanent storage.
> > > +		 * Only move pages to writeback if this bdi is over its
> > > +		 * threshold otherwise wait until the disk writes catch
> > > +		 * up.
> > > +		 */
> > > +		if (bdi_nr_reclaimable > bdi_thresh) {
> > > +			writeback_inodes(&wbc);
> > > +			pages_written += write_chunk - wbc.nr_to_write;
> > 
> > > +			if (wbc.nr_to_write == 0)
> > > +				continue;
> > 
> > What's the purpose of the above 2 lines?
> 
> I think I should slow down, I seem to have totally missed these two
> lines when I read the patch :/
> 
> > > +		}
> > >  		congestion_wait(BLK_RW_ASYNC, HZ/10);
> > >  	}
> > >  
> > >  	if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > > +	    bdi->dirty_exceeded)
> > >  		bdi->dirty_exceeded = 0;
> > >  
> > >  	if (writeback_in_progress(bdi))
> > > @@ -580,10 +552,8 @@ static void balance_dirty_pages(struct a
> > >  	 * In normal mode, we start background writeout at the lower
> > >  	 * background_thresh, to keep the amount of dirty memory low.
> > >  	 */
> > > +	if ((laptop_mode && pages_written) || (!laptop_mode &&
> > > +	     (nr_reclaimable > background_thresh)))
> > >  		bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE);
> > >  }
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux