On 4/24/19 7:35 AM, Vlastimil Babka wrote: > On 4/23/19 6:39 PM, Mike Kravetz wrote: >>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It >>> looks like there is something wrong in the reclaim going on. >> >> Ok, I will start digging into that. Just wanted to make sure before I got >> into it too deep. >> >> BTW - This is very easy to reproduce. Just try to allocate more huge pages >> than will fit into memory. I see this 'reclaim taking forever' behavior on >> v5.1-rc5-mmotm-2019-04-19-14-53. Looks like it was there in v5.0 as well. > > I'd suspect this in should_continue_reclaim(): > > /* Consider stopping depending on scan and reclaim activity */ > if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) { > /* > * For __GFP_RETRY_MAYFAIL allocations, stop reclaiming if the > * full LRU list has been scanned and we are still failing > * to reclaim pages. This full LRU scan is potentially > * expensive but a __GFP_RETRY_MAYFAIL caller really wants to succeed > */ > if (!nr_reclaimed && !nr_scanned) > return false; > > And that for some reason, nr_scanned never becomes zero. But it's hard > to figure out through all the layers of functions :/ I got back to looking into the direct reclaim/compaction stalls when trying to allocate huge pages. As previously mentioned, the code is looping for a long time in shrink_node(). The routine should_continue_reclaim() returns true perhaps more often than it should. As Vlastmil guessed, my debug code output below shows nr_scanned is remaining non-zero for quite a while. This was on v5.2-rc6. To help determine what is happening, I added a counter to determine how many times should_continue_reclaim was returning true within one calling context from shrink_node. Every 1,000,000 calls, I set a 'debug flag' to get a little more information about what is happening before the next call. Here is output from debug code with some comments about what the debug code is showing. [ 1477.583859] ***should_continue_reclaim: setting debug_nr_scanned call 1000000 [ 1477.584698] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... shrink_node calls shrink_node_memcg which calls shrink_list. shrink_list can increment sc->nr_scanned which is used to compute the value of nr_scanned passed to should_continue_reclaim. Here, we see that only one page is isolated for potential reclaim. [ 1477.585465] shrink_page_list: goto activate_locked 4 shrink_page_list determines that the page can not be reclaimed. The following code in shrink_page_list determines this. if (page_has_private(page)) { if (!try_to_release_page(page, sc->gfp_mask)) { /* Obviously, my debug code. */ if (sc->debug_nr_scanned) printk("shrink_page_list: goto activate_locked 4\n"); goto activate_locked; } [ 1477.586044] shrink_inactive_list: nr_reclaimed = 0 The bottom line is that page is not reclaimed. But, it was 'scanned' so .. [ 1477.586627] ***should_continue_reclaim: sc->nr_reclaimed 9 pages_for_compaction 1024 [ 1477.587546] nr_reclaimed 0, nr_scanned 1 should_continue_reclaim is called with nr_reclaimed 0, nr_scanned 1 [ 1477.588124] inactive_lru_pages 1 sc->nr_scanned 1000001 Since sc->nr_scanned is 1000001 for 1000001 calls, it looks like we scanned one page each time. Additional similar output without comments. [ 1511.200515] ***should_continue_reclaim: setting debug_nr_scanned call 2000000 [ 1511.201581] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1511.202615] shrink_page_list: goto activate_locked 4 [ 1511.203561] shrink_inactive_list: nr_reclaimed = 0 [ 1511.204422] ***should_continue_reclaim: sc->nr_reclaimed 10 pages_for_compaction 1024 [ 1511.205785] nr_reclaimed 0, nr_scanned 1 [ 1511.206569] inactive_lru_pages 1 sc->nr_scanned 2000001 [ 1543.982310] ***should_continue_reclaim: setting debug_nr_scanned call 3000000 [ 1543.983692] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1543.984645] shrink_page_list: goto activate_locked 4 [ 1543.985405] shrink_inactive_list: nr_reclaimed = 0 [ 1543.986386] ***should_continue_reclaim: sc->nr_reclaimed 11 pages_for_compaction 1024 [ 1543.987662] nr_reclaimed 0, nr_scanned 1 [ 1543.988423] inactive_lru_pages 1 sc->nr_scanned 3000001 [ 1577.104923] ***should_continue_reclaim: setting debug_nr_scanned call 4000000 [ 1577.105857] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1577.106752] shrink_page_list: goto activate_locked 4 [ 1577.107454] shrink_inactive_list: nr_reclaimed = 0 [ 1577.108163] ***should_continue_reclaim: sc->nr_reclaimed 11 pages_for_compaction 1024 [ 1577.109423] nr_reclaimed 0, nr_scanned 1 [ 1577.110205] inactive_lru_pages 1 sc->nr_scanned 4000001 [ 1614.075236] ***should_continue_reclaim: setting debug_nr_scanned call 5000000 [ 1614.076375] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1614.077410] shrink_page_list: goto activate_locked 4 [ 1614.078210] shrink_inactive_list: nr_reclaimed = 0 [ 1614.078913] ***should_continue_reclaim: sc->nr_reclaimed 13 pages_for_compaction 1024 [ 1614.079907] nr_reclaimed 0, nr_scanned 1 [ 1614.080496] inactive_lru_pages 1 sc->nr_scanned 5000001 [ 1650.360466] ***should_continue_reclaim: setting debug_nr_scanned call 6000000 [ 1650.361342] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1650.362147] shrink_page_list: goto activate_locked 4 [ 1650.362759] shrink_inactive_list: nr_reclaimed = 0 [ 1650.363685] ***should_continue_reclaim: sc->nr_reclaimed 13 pages_for_compaction 1024 [ 1650.364648] nr_reclaimed 0, nr_scanned 1 [ 1650.365222] inactive_lru_pages 0 sc->nr_scanned 6000001 [ 1685.061380] ***should_continue_reclaim: setting debug_nr_scanned call 7000000 [ 1685.062529] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1685.063615] shrink_page_list: goto activate_locked 4 [ 1685.064439] shrink_inactive_list: nr_reclaimed = 0 [ 1685.065244] ***should_continue_reclaim: sc->nr_reclaimed 14 pages_for_compaction 1024 [ 1685.066536] nr_reclaimed 0, nr_scanned 1 [ 1685.067356] inactive_lru_pages 1 sc->nr_scanned 7000001 [ 1719.103176] ***should_continue_reclaim: setting debug_nr_scanned call 8000000 [ 1719.104206] shrink_inactive_list: nr_taken 1 = isolate_lru_pages(1, ... [ 1719.105117] shrink_page_list: goto activate_locked 4 [ 1719.105781] shrink_inactive_list: nr_reclaimed = 0 [ 1719.106475] ***should_continue_reclaim: sc->nr_reclaimed 15 pages_for_compaction 1024 [ 1719.107499] nr_reclaimed 0, nr_scanned 1 [ 1719.108129] inactive_lru_pages 1 sc->nr_scanned 8000001 [ 1743.911025] ***should_continue_reclaim: false after 8711717 calls !nr_reclaimed && !nr_scanned Note that we do reclaim a 'few' pages along the way. After 8711717 calls should_continue_reclaim finally hits the condition where (!nr_reclaimed && !nr_scanned) and returns false. We were stuck looping in shrink node for about 5 minutes. Any additional insight, suggestions for additional debug, etc. would be appreciated. -- Mike Kravetz