On Tue, 2012-06-26 at 23:41 -0700, Andrew Morton wrote: > On Wed, 27 Jun 2012 15:33:09 +0900 Minchan Kim <minchan@xxxxxxxxxx> wrote: > > > Anyway, let's wait further answer, especially, RT folks. > > rt folks said "it isn't changing", and I agree with them. It isn't > worth breaking the rt-prio quality of service because a few odd parts > of the kernel did something inappropriate. Especially when those > few sites have alternatives. I'm not exactly sure its a 'few' sites.. but yeah there's a few obvious sites we should look at. Afaict all lru_add_drain_all() callers do this optimistically, esp. since there's no hard sync. against adding new entries to the per-cpu pagevecs. So there's no hard requirement to wait for completion, now not waiting has obvious problems as well, but we could cheat and timeout after a few jiffies or so. This would avoid the DoS scenario, it will not improve the over-all quality of the kernel though, since an unflushed pagevec can result in compaction etc. failing. The problem with stuffing all this in hardirq context (using on_each_cpu() and friends) is that these people who do spin in fifo threads generally don't like interrupt latencies forced on them either. And I presume its currently scheduled is because its potentially quite expensive to flush all these pages. The only alternative I can come up with is scheduling the work like we do now, wait for it for a few jiffies, track which CPUs completed, cancel the others, and remote flush their pagevecs from the calling cpu. But I can't say I like that option either... As it stands I've always said that doing while(1) from FIFO/RR tasks is broken and you get to keep the pieces. If we can find good solutions for this I'm all ears, but I don't think its something we should bend over backwards for. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href