On Wed, Jan 11, 2012 at 9:04 AM, Milton Miller <miltonm@xxxxxxx> wrote: >> > > @@ -1097,7 +1105,19 @@ void drain_local_pages(void *arg) >> > > */ >> > > void drain_all_pages(void) >> > > { >> > > - on_each_cpu(drain_local_pages, NULL, 1); >> > > + int cpu; >> > > + struct per_cpu_pageset *pcp; >> > > + struct zone *zone; >> > > + >> > > + for_each_online_cpu(cpu) >> > > + for_each_populated_zone(zone) { >> > > + pcp = per_cpu_ptr(zone->pageset, cpu); >> > > + if (pcp->pcp.count) >> > > + cpumask_set_cpu(cpu, cpus_with_pcps); >> > > + else >> > > + cpumask_clear_cpu(cpu, cpus_with_pcps); >> > > + } >> > > + on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1); >> > > > This will only select cpus that have pcp pages in the last zone, > not pages in any populated zone. Oy! you are very right. > > Call cpu_mask_clear before the for_each_online_cpu loop, and only > set cpus in the loop. > > Hmm, that means we actually need a lock for using the static mask. We don't have to clear before the loop or lock. We can do something like this: int cpu; struct per_cpu_pageset *pcp; struct zone *zone; for_each_online_cpu(cpu) { bool has_pcps = false; for_each_populated_zone(zone) { pcp = per_cpu_ptr(zone->pageset, cpu); if (pcp->pcp.count) { has_pcps = true; break; } } if (has_pcps) cpumask_set_cpu(cpu, cpus_with_pcps); else cpumask_clear_cpu(cpu, cpus_with_pcps); } on_each_cpu_mask(cpus_with_pcps, drain_local_pages, NULL, 1); > > -- what would happen without any lock? > [The last to store would likely see all the cpus and send them IPIs > but the earlier, racing callers would miss some cpus (It will test > our smp_call_function_many locking!), and would retry before all the > pages were drained from pcp pools]. The locking should be better > than racing -- one cpu will peek and pull all per-cpu data to itself, > then copy the mask to its smp_call_function_many element, then wait > for all the cpus it found cpus to process their list pulling their > pcp data back to the owners. Not much sense in the racing cpus > figighting to bring the per-cpu data to themselves to write to the > now contended static mask while pulling the zone pcp data from the > owning cpus that are trying to push to the buddy lists. > That is a very good point and I guess you are right that the locking saves cache bounces and probably also some IPIs. I am not 100% it is a win though. The lockless approach is simpler, has zero risk of dead locks. I also fear that any non interruptable lock (be it spinlock or interruptable lock non mutex) might delay an OOM in progress. Having the lock there is not a correctness issue - it is a performance enhancement. Because this is not a fast path, i would go for simple and less risk and not have the lock there at all. > The benefit numbers in the changelog need to be measured again > after this correction. > Yes, of course. Preliminary numbers with the lockless version above still looks good. I am guessing though that Mel's patch will make all this moot anyway since if we're not doing a drain_all in the direct reclaim we're left with very rare code paths (memory error and memory migration) that call the drain_all and they wont tend to do that concurrently like the direct reclaim. > > > Disabling preemption around online loop and the call will prevent > races with cpus going offline, but we do not that we care as the > offline notification will cause the notified cpu to drain that pool. > on_each_cpu_mask disables preemption as part of its processing. > > If we document the o-e-c-m behaviour then we should consider putting a > comment here, or at least put the previous paragraph in the change log. I noticed that Mel's patch already added get_online_cpus() in drain_all. > >> > > } >> > > >> > > #ifdef CONFIG_HIBERNATION >> > > @@ -3601,6 +3621,10 @@ static void setup_zone_pageset(struct zone *zone) >> > > void __init setup_per_cpu_pageset(void) >> > > { >> > > struct zone *zone; >> > > + int ret; >> > > + >> > > + ret = zalloc_cpumask_var(&cpus_with_pcps, GFP_KERNEL); >> > > + BUG_ON(!ret); >> > > > > Switching to cpumask_t will eliminate this hunk. Even if we decide > to keep it a cpumask_var_t we don't need to pre-zero it as we set > the entire mask so alloc_cpumask_var would be sufficient. > hmm... then we can make it a static local variable and it doesn't even make the kernel image really bigger since it's in the BSS. Cool. Thanks, Gilad -- Gilad Ben-Yossef Chief Coffee Drinker gilad@xxxxxxxxxxxxx Israel Cell: +972-52-8260388 US Cell: +1-973-8260388 http://benyossef.com "Unfortunately, cache misses are an equal opportunity pain provider." -- Mike Galbraith, LKML -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html