On Thu, 17 Mar 2011 16:27:29 -0500 Alex Villac____s Lasso <avillaci@xxxxxxxxxxxxxxxxx> wrote: > > So it appears that the system is full of dirty pages against a slow > > device and your foreground processes have got stuck in direct reclaim > > -> compaction -> migration. That's Mel ;) > > > > What happened to the plans to eliminate direct reclaim? > > > > > Browsing around bugzilla, I believe that bug 12309 looks very similar to the issue I am experiencing, especially from comment #525 onwards. Am I correct in this? ah, the epic 12309. https://bugzilla.kernel.org/show_bug.cgi?id=12309. If you're ever wondering how much we suck, go read that one. I think what we're seeing in 31142 is a large amount of dirty data buffered against a slow device. Innocent processes enter page reclaim and end up getting stuck trying to write to that heavily-queued and slow device. If so, that's probably what some of the 12309 participants are seeing. But there are lots of other things in that report too. Now, the problem you're seeing in 31142 isn't really supposed to happen. In the direct-reclaim case the code will try to avoid initiation of blocking I/O against a congested device, via the bdi_write_congested() test in may_write_to_queue(). Although that code now looks a bit busted for the order>PAGE_ALLOC_COSTLY_ORDER case, whodidthat. However in the case of the new(ish) compaction/migration code I don't think we're performing that test. migrate_pages()->unmap_and_move() will get stuck behind that large&slow IO queue if page reclaim decided to pass it down sync==true, as it apparently has done. IOW, Mel broke it ;) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>