On Fri, 16 Sep 2011 20:39:11 -0700 Michel Lespinasse <walken@xxxxxxxxxx> wrote: > Scan some number of pages from each node every second, instead of trying to > scan the entime memory at once and being idle for the rest of the configured > interval. Well... why? The amount of work done per scan interval is the same (actually, it will be slightly increased due to cache evictions). I think we should see a good explanation of what observed problem this hackery^Wtweak is trying to solve. Once that is revealed, we can compare the proposed solution with one based on thread policy/priority (for example). > > .... > > @@ -5788,21 +5800,60 @@ static int kstaled(void *dummy) > */ > BUG_ON(scan_seconds <= 0); > > - for_each_mem_cgroup_all(mem) > - memset(&mem->idle_scan_stats, 0, > - sizeof(mem->idle_scan_stats)); > + earlier = jiffies; > > + scan_done = true; > for_each_node_state(nid, N_HIGH_MEMORY) > - kstaled_scan_node(NODE_DATA(nid)); > + scan_done &= kstaled_scan_node(NODE_DATA(nid), > + scan_seconds, reset); > + > + if (scan_done) { > + struct mem_cgroup *mem; > + > + for_each_mem_cgroup_all(mem) { > + write_seqcount_begin(&mem->idle_page_stats_lock); > + mem->idle_page_stats = mem->idle_scan_stats; > + mem->idle_page_scans++; > + write_seqcount_end(&mem->idle_page_stats_lock); > + memset(&mem->idle_scan_stats, 0, > + sizeof(mem->idle_scan_stats)); > + } > + } > > - for_each_mem_cgroup_all(mem) { > - write_seqcount_begin(&mem->idle_page_stats_lock); > - mem->idle_page_stats = mem->idle_scan_stats; > - mem->idle_page_scans++; > - write_seqcount_end(&mem->idle_page_stats_lock); > + delta = jiffies - earlier; > + if (delta < HZ / 2) { > + delayed = 0; > + schedule_timeout_interruptible(HZ - delta); > + } else { > + /* > + * Emergency throttle if we're taking too long. > + * We are supposed to scan an entire slice in 1 second. > + * If we keep taking longer for 10 consecutive times, > + * scale back our scan_seconds. > + * > + * If someone changed kstaled_scan_seconds while we > + * were running, hope they know what they're doing and > + * assume they've eliminated any delays. > + */ > + bool updated = false; > + spin_lock(&kstaled_scan_seconds_lock); > + if (scan_seconds != kstaled_scan_seconds) > + delayed = 0; > + else if (++delayed == 10) { > + delayed = 0; > + scan_seconds *= 2; > + kstaled_scan_seconds = scan_seconds; > + updated = true; > + } > + spin_unlock(&kstaled_scan_seconds_lock); > + if (updated) > + pr_warning("kstaled taking too long, " > + "scan_seconds now %d\n", > + scan_seconds); > + schedule_timeout_interruptible(HZ / 2); This is all rather unpleasing. > > - schedule_timeout_interruptible(scan_seconds * HZ); > + reset = scan_done; > } > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>