On Sun 19-06-11 23:01:11, Wu Fengguang wrote: > The start of a heavy weight application (ie. KVM) may instantly knock > down determine_dirtyable_memory() and hence the global/bdi dirty > thresholds. > > So introduce global_dirty_limit for tracking the global dirty threshold > with policies > > - follow downwards slowly > - follow up in one shot > > global_dirty_limit can effectively mask out the impact of sudden drop of > dirtyable memory. It will be used in the next patch for two new type of > dirty limits. Looking into the code, this patch is dependent on previously submitted patches for estimation of BDI write bandwidth, isn't it? > > Signed-off-by: Wu Fengguang <fengguang.wu@xxxxxxxxx> > --- > include/linux/writeback.h | 2 + > mm/page-writeback.c | 41 ++++++++++++++++++++++++++++++++++++ > 2 files changed, 43 insertions(+) > > --- linux-next.orig/include/linux/writeback.h 2011-06-19 22:56:18.000000000 +0800 > +++ linux-next/include/linux/writeback.h 2011-06-19 22:59:29.000000000 +0800 > @@ -88,6 +88,8 @@ static inline void laptop_sync_completio > #endif > void throttle_vm_writeout(gfp_t gfp_mask); > > +extern unsigned long global_dirty_limit; > + > /* These are exported to sysctl. */ > extern int dirty_background_ratio; > extern unsigned long dirty_background_bytes; > --- linux-next.orig/mm/page-writeback.c 2011-06-19 22:56:18.000000000 +0800 > +++ linux-next/mm/page-writeback.c 2011-06-19 22:59:29.000000000 +0800 > @@ -116,6 +116,7 @@ EXPORT_SYMBOL(laptop_mode); > > /* End of sysctl-exported parameters */ > > +unsigned long global_dirty_limit; > > /* > * Scale the writeback cache size proportional to the relative writeout speeds. > @@ -510,6 +511,43 @@ static void bdi_update_write_bandwidth(s > bdi->avg_write_bandwidth = avg; > } > > +static void update_dirty_limit(unsigned long thresh, > + unsigned long dirty) > +{ > + unsigned long limit = global_dirty_limit; > + > + if (limit < thresh) { > + limit = thresh; > + goto update; > + } > + > + if (limit > thresh && > + limit > dirty) { > + limit -= (limit - max(thresh, dirty)) >> 5; > + goto update; > + } Hmm, but strictly speaking this never really converges to the new limit, right? And even in practice it takes 22 steps to converge within 50% and 73 steps to converge withing 10% of the desired threshold. But before we get into the discussion about how to update dirty threshold, I'd like to discuss what are the properties we require from dirty threshold updates? I understand it's kind of unexpected that when some process allocates anonymous memory, we suddently stall writers to flush out dirty data. OTOH it is a nice behavior from memory management point of view because we really try to keep amount of dirty pages in free memory at given percentage which is nice for reclaim. When we choose to update dirty limit in steps as you propose, we get some kind of compromise between behavior for user and behavior for memory management. But there are also other options - for example it would seem natural to me treat allocation of anonymous page the same way as dirtiying the page thus the process getting us over dirty threshold due to allocations will wait until enough pages are written. Then we wouldn't need any smoothing of memory available for caches because allocations would be naturally throttled (OK, we still have memory taken by kernel allocations but these are order of magnitude smaller problem I'd say). But I'm not sure how acceptable this idea would be. Honza > + return; > +update: > + global_dirty_limit = limit; > +} > + > +static void global_update_bandwidth(unsigned long thresh, > + unsigned long dirty, > + unsigned long now) > +{ > + static DEFINE_SPINLOCK(dirty_lock); > + > + if (now - default_backing_dev_info.bw_time_stamp < MAX_PAUSE) > + return; > + > + spin_lock(&dirty_lock); > + if (now - default_backing_dev_info.bw_time_stamp >= MAX_PAUSE) { > + update_dirty_limit(thresh, dirty); > + default_backing_dev_info.bw_time_stamp = now; > + } > + spin_unlock(&dirty_lock); > +} > + > void __bdi_update_bandwidth(struct backing_dev_info *bdi, > unsigned long thresh, > unsigned long dirty, > @@ -535,6 +573,9 @@ void __bdi_update_bandwidth(struct backi > if (elapsed > HZ && time_before(bdi->bw_time_stamp, start_time)) > goto snapshot; > > + if (thresh) > + global_update_bandwidth(thresh, dirty, now); > + > bdi_update_write_bandwidth(bdi, elapsed, written); > > snapshot: > > -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html