On Mon, Mar 21, 2011 at 09:16:41PM +0100, Andrea Arcangeli wrote: > On Mon, Mar 21, 2011 at 12:05:40PM -0500, Alex Villacís Lasso wrote: > > El 21/03/11 11:37, Mel Gorman escribió: > > > On Mon, Mar 21, 2011 at 02:48:32PM +0100, Andrea Arcangeli wrote: > > > > > > Nothing bad jumped out at me. Lets see how it gets on with testing. > > > > > > Thanks > > > > > As with the previous patch, this one did not completely solve the freezing tasks issue. However, as with the previous patch, the freezes took longer to appear, and now lasted less (10 to 12 seconds instead of freezing until the end of the usb copy). > > > > I have attached the new sysrq-w trace to the bug report. > > migrate and compaction disappeared from the traces as we hoped > for. The THP allocations left throttles on writeback during reclaim > like any 4k allocation would do: > > [ 2629.256809] [<ffffffff810e43c3>] wait_on_page_writeback+0x1b/0x1d > [ 2629.256812] [<ffffffff810e5992>] shrink_page_list+0x134/0x478 > [ 2629.256815] [<ffffffff810e614f>] shrink_inactive_list+0x29f/0x39a > [ 2629.256818] [<ffffffff810dbd55>] ? zone_watermark_ok+0x1f/0x21 > [ 2629.256820] [<ffffffff810dfe81>] ? determine_dirtyable_memory+0x1d/0x27 > [ 2629.256823] [<ffffffff810e6849>] shrink_zone+0x362/0x464 > [ 2629.256827] [<ffffffff810e6c87>] do_try_to_free_pages+0xdd/0x2e3 > [ 2629.256830] [<ffffffff810e70eb>] try_to_free_pages+0xaa/0xef > [ 2629.256833] [<ffffffff810deede>] __alloc_pages_nodemask+0x4cc/0x772 > [ 2629.256837] [<ffffffff8110c0ea>] alloc_pages_vma+0xec/0xf1 > [ 2629.256840] [<ffffffff8111be94>] do_huge_pmd_anonymous_page+0xbf/0x267 > [ 2629.256844] [<ffffffff810f24a3>] ? pmd_offset+0x19/0x40 > [ 2629.256846] [<ffffffff810f5c7c>] handle_mm_fault+0x15d/0x20f > [ 2629.256850] [<ffffffff8100f298>] ? arch_get_unmapped_area_topdown+0x1c3/0x28f > [ 2629.256853] [<ffffffff814818cc>] do_page_fault+0x33b/0x35d > [ 2629.256856] [<ffffffff810fb089>] ? do_mmap_pgoff+0x29a/0x2f4 > [ 2629.256859] [<ffffffff8112dd66>] ? path_put+0x22/0x27 > [ 2629.256861] [<ffffffff8147f285>] page_fault+0x25/0x30 > There is an important difference between THP and generic order-0 reclaim though. Once defrag is enabled in THP, it can enter direct reclaim for reclaim/compaction where more pages may be claimed than for a base page fault thereby encountering more dirty pages and stalling. > They throttle on writeback I/O completion like kswapd too: > > [ 2849.098751] [<ffffffff8147d00b>] io_schedule+0x47/0x62 > [ 2849.098756] [<ffffffff8121c47b>] get_request_wait+0x10a/0x197 > [ 2849.098760] [<ffffffff8106cd77>] ? autoremove_wake_function+0x0/0x3d > [ 2849.098763] [<ffffffff8121cd3c>] __make_request+0x2c8/0x3e0 > [ 2849.098767] [<ffffffff81114889>] ? kmem_cache_alloc+0x73/0xeb > [ 2849.098771] [<ffffffff8121bbdf>] generic_make_request+0x2bc/0x336 > [ 2849.098774] [<ffffffff8121bd39>] submit_bio+0xe0/0xff > [ 2849.098777] [<ffffffff8114d7a5>] ? bio_alloc_bioset+0x4d/0xc4 > [ 2849.098781] [<ffffffff810edf2b>] ? inc_zone_page_state+0x2d/0x2f > [ 2849.098785] [<ffffffff811492ec>] submit_bh+0xe8/0x10e > [ 2849.098788] [<ffffffff8114ba72>] __block_write_full_page+0x1ea/0x2da > [ 2849.098793] [<ffffffffa06e5202>] ? udf_get_block+0x0/0x115 [udf] > [ 2849.098796] [<ffffffff8114a6b8>] ? end_buffer_async_write+0x0/0x12d > [ 2849.098799] [<ffffffff8114a6b8>] ? end_buffer_async_write+0x0/0x12d > [ 2849.098802] [<ffffffffa06e5202>] ? udf_get_block+0x0/0x115 [udf] > [ 2849.098805] [<ffffffff8114bbee>] block_write_full_page_endio+0x8c/0x98 > [ 2849.098808] [<ffffffff8114bc0f>] block_write_full_page+0x15/0x17 > [ 2849.098811] [<ffffffffa06e2027>] udf_writepage+0x18/0x1a [udf] > [ 2849.098814] [<ffffffff810e44fd>] pageout+0x138/0x255 > [ 2849.098817] [<ffffffff810e5ad7>] shrink_page_list+0x279/0x478 > [ 2849.098820] [<ffffffff810e60ec>] shrink_inactive_list+0x23c/0x39a > [ 2849.098824] [<ffffffff81481a46>] ? add_preempt_count+0xae/0xb2 > [ 2849.098828] [<ffffffff810dfe81>] ? determine_dirtyable_memory+0x1d/0x27 > [ 2849.098831] [<ffffffff810e6849>] shrink_zone+0x362/0x464 > [ 2849.098834] [<ffffffff810dbdf8>] ? zone_watermark_ok_safe+0xa1/0xae > [ 2849.098837] [<ffffffff810e773f>] kswapd+0x51c/0x89f > > I'm unsure if there's any other problem left that can be attributed to > compaction/migrate (especially considering the THP allocations have no > __GFP_REPEAT set and should_continue_reclaim should break the loop if > nr_reclaim is zero, plus compaction_suitable requires not much more > memory to be reclaimed if compared to no-compaction). > I think we are breaking out because the report says the stalls aren't as bad but not before we have waited on writeback of a few dirty pages. This could be addressed in a number of ways but all of them impact THP in some way. 1. We could disable defrag by default. This will avoid the stalling at the cost of fewer pages being promoted even when plenty of clean pages were available. 2. We could redefine __GFP_NO_KSWAPD as __GFP_ASYNC to mean a) do not wake up kswapd that generates IO possibly causing syncs later b) does not queue any pages for IO itself and c) never waits on page writeback. This would also avoid stalls but it would disrupt LRU ordering by reclaiming younger pages than would otherwise have been reclaimed. 3. Again redefine __GFP_NO_KSWAPD but abort allocation if any dirty or writeback page is encountered during reclaim. This makes the assumption that dirty pages at the end of the LRU implies memory is under enough pressure to not care about promotion. This will also result in THP promoting fewer pages but has less impact on LRU ordering. Which would you prefer? Other suggestions? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>