On Mon, 17 Jun 2013 17:39:36 -0400 Rapha__l Beamonte <raphael.beamonte@xxxxxxxxx> wrote: > 2013/6/17 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > That change wasn't terribly efficient - if there are any unpopulated > > pages in the range (which is quite likely), fadvise() will now always > > call invalidate_mapping_pages() a second time. > > > > Perhaps this is fixable. Say, make lru_add_drain_all() return a > > success code, or even teach lru_add_drain_all() to return a code > > indicating that one of the spilled pages was (or might have been) on a > > particular mapping. > > > > Following our tests results, that was the call to lru_add_drain_all() that > causes the problem. The second call to invalidate_mapping_pages() isn't > really important. We tried to compile a kernel with the commit introducing > this change but with the "lru_add_drain_all()" line removed, and the > problem disappeared, even if we called two times invalidate_mapping_pages() > (as the rest of the commit was still here). Ah, OK, schedule_on_each_cpu() could certainly do that - it has to wait for every CPU to context switch and schedule the worker function. There's a lot we could do here. Such as not doing the schedule_work() at all for a cpu which has an empty lru_add_pvec. Or even pass down the address_space and only schedule the work for CPUs which have a page from *this mapping* in their lru_add_pvec. That will all be highly racy, but as long as the failure mode is "flushed unnecessarily" then that's OK. -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html