On 8/28/21 01:04, Mike Kravetz wrote: > On 8/27/21 10:22 AM, Vlastimil Babka wrote: > I 'may' have been over stressing the system with all CPUs doing file > reads to fill the page cache with clean pages. I certainly need to > spend some more debug/analysis time on this. Hm that *could* play a role, as these will allow reclaim to make progress, but also the reclaimed pages might be stolen immediately and compaction will return COMPACT_SKIPPED and in should_compact_retry() we might go through this code path: /* * compaction was skipped because there are not enough order-0 pages * to work with, so we retry only if it looks like reclaim can help. */ if (compaction_needs_reclaim(compact_result)) { ret = compaction_zonelist_suitable(ac, order, alloc_flags); goto out; } where compaction_zonelist_suitable() will return true because it appears reclaim can free pages to allow progress. And there are no max retries applied for this case. With the reclaim and compaction tracepoints it should be possible to confirm this scenario.