On 9/7/21 1:50 AM, Hillf Danton wrote: > On Mon, 6 Sep 2021 16:40:28 +0200 Vlastimil Babka wrote: >> On 9/2/21 20:17, Mike Kravetz wrote: >>> >>> Here is some very high level information from a long stall that was >>> interrupted. This was an order 9 allocation from alloc_buddy_huge_page(). >>> >>> 55269.530564] __alloc_pages_slowpath: jiffies 47329325 tries 609673 cpu_tries 1 node 0 FAIL >>> [55269.539893] r_tries 25 c_tries 609647 reclaim 47325161 compact 607 >>> >>> Yes, in __alloc_pages_slowpath for 47329325 jiffies before being interrupted. >>> should_reclaim_retry returned true 25 times and should_compact_retry returned >>> true 609647 times. >>> Almost all time (47325161 jiffies) spent in __alloc_pages_direct_reclaim, and >>> 607 jiffies spent in __alloc_pages_direct_compact. >>> >>> Looks like both >>> reclaim retries > MAX_RECLAIM_RETRIES >>> and >>> compaction retries > MAX_COMPACT_RETRIES >>> >> Yeah AFAICS that's only possible with the scenario I suspected. I guess >> we should put a limit on compact retries (maybe some multiple of >> MAX_COMPACT_RETRIES) even if it thinks that reclaim could help, while >> clearly it doesn't (i.e. because somebody else is stealing the page like >> in your test case). > > And/or clamp reclaim retries for costly orders > > reclaim retries = MAX_RECLAIM_RETRIES - order; > > to pull down the chance for stall as low as possible. Thanks, and sorry for not replying quickly. I only get back to this as time allows. We could clamp the number of compaction and reclaim retries in __alloc_pages_slowpath as suggested. However, I noticed that a single reclaim call could take a bunch of time. As a result, I instrumented shrink_node to see what might be happening. Here is some information from a long stall. Note that I only dump stats when jiffies > 100000. [ 8136.874706] shrink_node: 507654 total jiffies, 3557110 tries [ 8136.881130] 130596341 reclaimed, 32 nr_to_reclaim [ 8136.887643] compaction_suitable results: [ 8136.893276] idx COMPACT_SKIPPED, 3557109 [ 8672.399839] shrink_node: 522076 total jiffies, 3466228 tries [ 8672.406268] 124427720 reclaimed, 32 nr_to_reclaim [ 8672.412782] compaction_suitable results: [ 8672.418421] idx COMPACT_SKIPPED, 3466227 [ 8908.099592] __alloc_pages_slowpath: jiffies 2939938 tries 17068 cpu_tries 1 node 0 success [ 8908.109120] r_tries 11 c_tries 17056 reclaim 2939865 compact 9 In this case, clamping the number of retries from should_compact_retry and should_reclaim_retry could help. Mostly because we will not be calling back into the reclaim code? Notice the long amount of time spent in shrink_node. The 'tries' in shrink_node come about from that: if (should_continue_reclaim(pgdat, sc->nr_reclaimed - nr_reclaimed, sc)) goto again; compaction_suitable results is the values returned from calls to should_continue_reclaim -> compaction_suitable. Trying to think if there might be an intelligent way to quit early. -- Mike Kravetz