Re: [Question] Should direct reclaim time be bounded?

Mike Kravetz <mike.kravetz@xxxxxxxxxx> · Fri, 28 Jun 2019 11:20:42 -0700

On 4/24/19 7:35 AM, Vlastimil Babka wrote:
> On 4/23/19 6:39 PM, Mike Kravetz wrote:
>>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It
>>> looks like there is something wrong in the reclaim going on.
>>
>> Ok, I will start digging into that.  Just wanted to make sure before I got
>> into it too deep.
>>
>> BTW - This is very easy to reproduce.  Just try to allocate more huge pages
>> than will fit into memory.  I see this 'reclaim taking forever' behavior on
>> v5.1-rc5-mmotm-2019-04-19-14-53.  Looks like it was there in v5.0 as well.
> 
> I'd suspect this in should_continue_reclaim():
> 
>         /* Consider stopping depending on scan and reclaim activity */
>         if (sc->gfp_mask & __GFP_RETRY_MAYFAIL) {
>                 /*
>                  * For __GFP_RETRY_MAYFAIL allocations, stop reclaiming if the
>                  * full LRU list has been scanned and we are still failing
>                  * to reclaim pages. This full LRU scan is potentially
>                  * expensive but a __GFP_RETRY_MAYFAIL caller really wants to succeed
>                  */
>                 if (!nr_reclaimed && !nr_scanned)
>                         return false;
> 
> And that for some reason, nr_scanned never becomes zero. But it's hard
> to figure out through all the layers of functions :/

I got back to looking into the direct reclaim/compaction stalls when
trying to allocate huge pages.  As previously mentioned, the code is
looping for a long time in shrink_node().  The routine
should_continue_reclaim() returns true perhaps more often than it should.

As Vlastmil guessed, my debug code output below shows nr_scanned is remaining
non-zero for quite a while.  This was on v5.2-rc6.

To help determine what is happening, I added a counter to determine how many
times should_continue_reclaim was returning true within one calling context
from shrink_node.  Every 1,000,000 calls, I set a 'debug flag' to get a little
more information about what is happening before the next call.  Here is output
from debug code with some comments about what the debug code is showing.

[ 1477.583859] ***should_continue_reclaim: setting debug_nr_scanned call 1000000
[ 1477.584698] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...

shrink_node calls shrink_node_memcg which calls shrink_list.  shrink_list
can increment sc->nr_scanned which is used to compute the value of nr_scanned
passed to should_continue_reclaim.  Here, we see that only one page is
isolated for potential reclaim.

[ 1477.585465] shrink_page_list: goto activate_locked 4

shrink_page_list determines that the page can not be reclaimed.  The following
code in shrink_page_list determines this.

                if (page_has_private(page)) {
                        if (!try_to_release_page(page, sc->gfp_mask))
{
/* Obviously, my debug code. */
if (sc->debug_nr_scanned)
  printk("shrink_page_list: goto activate_locked 4\n");
                                goto activate_locked;
}

[ 1477.586044] shrink_inactive_list: nr_reclaimed = 0

The bottom line is that page is not reclaimed.  But, it was 'scanned' so ..

[ 1477.586627] ***should_continue_reclaim: sc->nr_reclaimed 9 pages_for_compaction  1024
[ 1477.587546]            nr_reclaimed 0, nr_scanned 1

should_continue_reclaim is called with nr_reclaimed 0, nr_scanned 1

[ 1477.588124]            inactive_lru_pages 1 sc->nr_scanned 1000001

Since sc->nr_scanned is 1000001 for 1000001 calls, it looks like we scanned
one page each time.

Additional similar output without comments.

[ 1511.200515] ***should_continue_reclaim: setting debug_nr_scanned call 2000000
[ 1511.201581] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1511.202615] shrink_page_list: goto activate_locked 4
[ 1511.203561] shrink_inactive_list: nr_reclaimed = 0
[ 1511.204422] ***should_continue_reclaim: sc->nr_reclaimed 10 pages_for_compaction  1024
[ 1511.205785]            nr_reclaimed 0, nr_scanned 1
[ 1511.206569]            inactive_lru_pages 1 sc->nr_scanned 2000001
[ 1543.982310] ***should_continue_reclaim: setting debug_nr_scanned call 3000000
[ 1543.983692] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1543.984645] shrink_page_list: goto activate_locked 4
[ 1543.985405] shrink_inactive_list: nr_reclaimed = 0
[ 1543.986386] ***should_continue_reclaim: sc->nr_reclaimed 11 pages_for_compaction  1024
[ 1543.987662]            nr_reclaimed 0, nr_scanned 1
[ 1543.988423]            inactive_lru_pages 1 sc->nr_scanned 3000001
[ 1577.104923] ***should_continue_reclaim: setting debug_nr_scanned call 4000000
[ 1577.105857] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1577.106752] shrink_page_list: goto activate_locked 4
[ 1577.107454] shrink_inactive_list: nr_reclaimed = 0
[ 1577.108163] ***should_continue_reclaim: sc->nr_reclaimed 11 pages_for_compaction  1024
[ 1577.109423]            nr_reclaimed 0, nr_scanned 1
[ 1577.110205]            inactive_lru_pages 1 sc->nr_scanned 4000001
[ 1614.075236] ***should_continue_reclaim: setting debug_nr_scanned call 5000000
[ 1614.076375] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1614.077410] shrink_page_list: goto activate_locked 4
[ 1614.078210] shrink_inactive_list: nr_reclaimed = 0
[ 1614.078913] ***should_continue_reclaim: sc->nr_reclaimed 13 pages_for_compaction  1024
[ 1614.079907]            nr_reclaimed 0, nr_scanned 1
[ 1614.080496]            inactive_lru_pages 1 sc->nr_scanned 5000001
[ 1650.360466] ***should_continue_reclaim: setting debug_nr_scanned call 6000000
[ 1650.361342] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1650.362147] shrink_page_list: goto activate_locked 4
[ 1650.362759] shrink_inactive_list: nr_reclaimed = 0
[ 1650.363685] ***should_continue_reclaim: sc->nr_reclaimed 13 pages_for_compaction  1024
[ 1650.364648]            nr_reclaimed 0, nr_scanned 1
[ 1650.365222]            inactive_lru_pages 0 sc->nr_scanned 6000001
[ 1685.061380] ***should_continue_reclaim: setting debug_nr_scanned call 7000000
[ 1685.062529] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1685.063615] shrink_page_list: goto activate_locked 4
[ 1685.064439] shrink_inactive_list: nr_reclaimed = 0
[ 1685.065244] ***should_continue_reclaim: sc->nr_reclaimed 14 pages_for_compaction  1024
[ 1685.066536]            nr_reclaimed 0, nr_scanned 1
[ 1685.067356]            inactive_lru_pages 1 sc->nr_scanned 7000001
[ 1719.103176] ***should_continue_reclaim: setting debug_nr_scanned call 8000000
[ 1719.104206] shrink_inactive_list: nr_taken 1 =  isolate_lru_pages(1, ...
[ 1719.105117] shrink_page_list: goto activate_locked 4
[ 1719.105781] shrink_inactive_list: nr_reclaimed = 0
[ 1719.106475] ***should_continue_reclaim: sc->nr_reclaimed 15 pages_for_compaction  1024
[ 1719.107499]            nr_reclaimed 0, nr_scanned 1
[ 1719.108129]            inactive_lru_pages 1 sc->nr_scanned 8000001
[ 1743.911025] ***should_continue_reclaim: false after 8711717 calls !nr_reclaimed && !nr_scanned

Note that we do reclaim a 'few' pages along the way.  After 8711717 calls
should_continue_reclaim finally hits the condition where (!nr_reclaimed &&
!nr_scanned) and returns false.  We were stuck looping in shrink node for
about 5 minutes.

Any additional insight, suggestions for additional debug, etc. would be
appreciated.
-- 
Mike Kravetz