Re: + mm-balance_dirty_pages-reduce-calls-to-global_page_state-to-reduce-c ache-references.patch added to -mm tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 2009-08-22 at 10:51 +0800, Wu Fengguang wrote:

> > +++ a/mm/page-writeback.c

> > @@ -465,7 +439,6 @@ get_dirty_limits(unsigned long *pbackgro
> >  			bdi_dirty = dirty * bdi->max_ratio / 100;
> >  
> >  		*pbdi_dirty = bdi_dirty;
> >  		task_dirty_limit(current, pbdi_dirty);
> >  	}
> >  }
> > @@ -499,45 +472,12 @@ static void balance_dirty_pages(struct a
> >  		};
> >  
> >  		get_dirty_limits(&background_thresh, &dirty_thresh,
> > +				 &bdi_thresh, bdi);
> >  
> >  		nr_reclaimable = global_page_state(NR_FILE_DIRTY) +
> > +			global_page_state(NR_UNSTABLE_NFS);
> > +		nr_writeback = global_page_state(NR_WRITEBACK) +
> > +			global_page_state(NR_WRITEBACK_TEMP);
> >  
> >  		/*
> >  		 * In order to avoid the stacked BDI deadlock we need
> > @@ -557,16 +497,48 @@ static void balance_dirty_pages(struct a
> >  			bdi_nr_writeback = bdi_stat(bdi, BDI_WRITEBACK);
> >  		}
> >  
> 
> > +		/* always throttle if over threshold */
> > +		if (nr_reclaimable + nr_writeback < dirty_thresh) {
> 
> That 'if' is a big behavior change. It effectively blocks every one
> and canceled Peter's proportional throttling work: the less a process
> dirtied, the less it should be throttled.

Hmm, I think you're right, I had not considered that, thanks for
catching that.

> I'd propose to remove the above 'if' and liberate the following three 'if's.

That might work, but it looses the total dirty_thresh constraint. The
sum of per-bdi dirties _should_ not be larger than that, but I'm not
sure it won't ever be.

The clip code Richard removed ensured that, and I think I wrote that out
of more than sheer paranoia, but I'm not sure anymore :/

I'll go over the numeric stuff again to see where it could go wrong.

If we can bound the error (I'm suspecting it was some numerical error
bound) we should be good and can indeed do this.

> > +
> > +			if (bdi_nr_reclaimable + bdi_nr_writeback <= bdi_thresh)
> > +				break;
> > +
> > +			/*
> > +			 * Throttle it only when the background writeback cannot
> > +			 * catch-up. This avoids (excessively) small writeouts
> > +			 * when the bdi limits are ramping up.
> > +			 */
> > +			if (nr_reclaimable + nr_writeback <
> > +			    (background_thresh + dirty_thresh) / 2)
> > +				break;
> > +
> > +			/* done enough? */
> > +			if (pages_written >= write_chunk)
> > +				break;
> > +		}
> > +		if (!bdi->dirty_exceeded)
> > +			bdi->dirty_exceeded = 1;
> >  
> > +		/* Note: nr_reclaimable denotes nr_dirty + nr_unstable.
> > +		 * Unstable writes are a feature of certain networked
> > +		 * filesystems (i.e. NFS) in which data may have been
> > +		 * written to the server's write cache, but has not yet
> > +		 * been flushed to permanent storage.
> > +		 * Only move pages to writeback if this bdi is over its
> > +		 * threshold otherwise wait until the disk writes catch
> > +		 * up.
> > +		 */
> > +		if (bdi_nr_reclaimable > bdi_thresh) {
> > +			writeback_inodes(&wbc);
> > +			pages_written += write_chunk - wbc.nr_to_write;
> 
> > +			if (wbc.nr_to_write == 0)
> > +				continue;
> 
> What's the purpose of the above 2 lines?

I think I should slow down, I seem to have totally missed these two
lines when I read the patch :/

> > +		}
> >  		congestion_wait(BLK_RW_ASYNC, HZ/10);
> >  	}
> >  
> >  	if (bdi_nr_reclaimable + bdi_nr_writeback < bdi_thresh &&
> > +	    bdi->dirty_exceeded)
> >  		bdi->dirty_exceeded = 0;
> >  
> >  	if (writeback_in_progress(bdi))
> > @@ -580,10 +552,8 @@ static void balance_dirty_pages(struct a
> >  	 * In normal mode, we start background writeout at the lower
> >  	 * background_thresh, to keep the amount of dirty memory low.
> >  	 */
> > +	if ((laptop_mode && pages_written) || (!laptop_mode &&
> > +	     (nr_reclaimable > background_thresh)))
> >  		bdi_start_writeback(bdi, NULL, 0, WB_SYNC_NONE);
> >  }


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux