On Thu, Jan 9, 2025 at 12:31 AM <yangge1116@xxxxxxx> wrote: > > From: yangge <yangge1116@xxxxxxx> > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > During the start-up of the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > Long term GUP cannot allocate memory from CMA area, so a maximum of > 16 GB of no-CMA memory on a NUMA node can be used as virtual machine > memory. There is 16GB of free CMA memory on a NUMA node, which is > sufficient to pass the order-0 watermark check, causing the > __compaction_suitable() function to consistently return true. > However, if there aren't enough migratable pages available, performing > memory compaction is also meaningless. Besides checking whether > the order-0 watermark is met, __compaction_suitable() also needs > to determine whether there are sufficient migratable pages available > for memory compaction. > > For costly allocations, because __compaction_suitable() always > returns true, __alloc_pages_slowpath() can't exit at the appropriate > place, resulting in excessively long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > When the 16G of non-CMA memory on a single node is exhausted, we will > fallback to allocating memory on other nodes. In order to quickly > fallback to remote nodes, we should skip memory compaction when > migratable pages are insufficient. After this fix, it only takes a > few tens of seconds to start a 32GB virtual machine with device > passthrough functionality. > > Signed-off-by: yangge <yangge1116@xxxxxxx> > --- > > V3: > - fix build error > > V2: > - consider unevictable folios > > mm/compaction.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 07bd227..a9f1261 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order, > int highest_zoneidx, > unsigned long wmark_target) > { > + pg_data_t __maybe_unused *pgdat = zone->zone_pgdat; > + unsigned long sum, nr_pinned; > unsigned long watermark; > + > + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + > + node_page_state(pgdat, NR_INACTIVE_ANON) + > + node_page_state(pgdat, NR_ACTIVE_FILE) + > + node_page_state(pgdat, NR_ACTIVE_ANON) + > + node_page_state(pgdat, NR_UNEVICTABLE); > + > + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - > + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); > + Does the sum of all LRU pages equal non-CMA memory? I'm quite confused for two reasons: 1. CMA pages can be LRU pages. 2. Free pages might not belong to any LRUs. > + /* > + * Gup-pinned pages are non-migratable. After subtracting these pages, > + * we need to check if the remaining pages are sufficient for memory > + * compaction. > + */ > + if ((sum - nr_pinned) < (1 << order)) > + return false; > + > /* > * Watermarks for order-0 must be met for compaction to be able to > * isolate free pages for migration targets. This means that the > -- > 2.7.4 > > Thanks barry