On Wed, Nov 16, 2011 at 05:13:50AM +0100, Andrea Arcangeli wrote: > After checking my current thp vmstat I think Andrew was right and we > backed out for a good reason before. I'm getting significantly worse > success rate, not sure why it was a small reduction in success rate > but hey I cannot exclude I may have broke something with some other > patch. I've been running it together with a couple more changes. If > it's this change that reduced the success rate, I'm afraid going > always async is not ok. I wonder if the high failure rate when shutting off "sync compaction" and forcing only "async compaction" for THP (your patch queued in -mm) is also because of ISOLATE_CLEAN being set in compaction from commit 39deaf8. ISOLATE_CLEAN skipping PageDirty means all tmpfs/anon pages added to swapcache (or removed from swapcache which sets the dirty bit on the page because the pte may be mapped clean) are skipped entirely by async compaction for no good reason. That can't possibly be ok, because those don't actually require any I/O or blocking to be migrated. PageDirty is a "blocking/IO" operation only for filebacked pages. So I think we must revert 39deaf8, instead of cleaning it up with my cleanup posted in Message-Id 20111115020831.GF4414@xxxxxxxxxx . ISOLATED_CLEAN still looks right for may_writepage, for reclaim dirty bit set on the page is a I/O event, for migrate it's not if it's tmpfs/anon. Did you run your compaction tests with some swap activity? Reducing the async compaction effectiveness while there's some swap activity then also leads in more frequently than needed running sync compaction and page reclaim. I'm hopeful however that by running just 2 passes of migrate_pages main loop with the "avoid overwork in migrate sync mode" patch, we can fix the excessive hanging. If that works number of passes could actually be a tunable, and setting it to 1 (instead of 2) would then provide 100% "async compaction" behavior again. And if somebody prefers to stick to 10 he can... so then he can do trylock pass 0, lock_page pass1, wait_writeback pass2, wait pin pass3, finally migrate pass4. (something 2 passes alone won't allow). So making the migrate passes/force-threshold tunable (maybe only for the new sync=2 migration mode) could be good idea. Or we could just return to sync true/false and have the migration tunable affect everything but that would alter the reliability of sys_move_pages and other numa things too, where I guess 10 passes are ok. This is why I added a sync=2 mode for migrate. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>