Re: [RFC PATCH 0/7] mm: providing ample physical memory contiguity by confining unmovable allocations

Zi Yan <ziy@xxxxxxxxxx> · Tue, 19 Mar 2024 22:47:50 -0400

On 19 Mar 2024, at 22:42, kaiyang2@xxxxxxxxxx wrote:

> From: Kaiyang Zhao <kaiyang2@xxxxxxxxxx>
>
> Memory capacity has increased dramatically over the last decades.
> Meanwhile, TLB capacity has stagnated, causing a significant virtual
> address translation overhead. As a collaboration between Carnegie Mellon
> University and Meta, we investigated the issue at Meta’s datacenters and
> found that about 20% of CPU cycles are spent doing page walks [1], and
> similar results are also reported by Google [2].
>
> To tackle the overhead, we need widespread uses of huge pages. And huge
> pages, when they can actually be created, work wonders: they provide up
> to 18% higher performance for Meta’s production workloads in our
> experiments [1].
>
> However, we observed that huge pages through THP are unreliable because
> sufficient physical contiguity may not exist and compaction to recover
> from memory fragmentation frequently fails. To ensure workloads get a
> reasonable number of huge pages, Meta could not rely on THP and had to
> use reserved huge pages. Proposals to add 1GB THP support [5] are even
> more dependent on ample availability of physical contiguity.
>
> A major reason for the lack of physical contiguity is the mixing of
> unmovable and movable allocations, causing compaction to fail. Quoting
> from [3], “in a broad sample of Meta servers, we find that unmovable
> allocations make up less than 7% of total memory on average, yet occupy
> 34% of the 2M blocks in the system. We also found that this effect isn't
> correlated with high uptimes, and that servers can get heavily
> fragmented within the first hour of running a workload.”
>
> Our proposed solution is to confine the unmovable allocations to a
> separate region in physical memory. We experimented with using a CMA
> region for the movable allocations, but in this version we use
> ZONE_MOVABLE for movable and all other zones for unmovable allocations.
> Movable allocations can temporarily reside in the unmovable zones, but
> will be proactively moved out by compaction.
>
> To resize ZONE_MOVABLE, we still rely on memory hotplug interfaces. We
> export the number of pages scanned on behalf of movable or unmovable
> allocations during reclaim to approximate the memory pressure in two
> parts of physical memory, and a userspace tool can monitor the metrics
> and make resizing decisions. Previously we augmented the PSI interface
> to break down memory pressure into movable and unmovable allocation
> types, but that approach enlarges the scheduler cacheline footprint.
> From our preliminary observations, just looking at the per-allocation
> type scanned counters and with a little tuning, it is sufficient to tell
> if there is not enough memory for unmovable allocations and make
> resizing decisions.
>
> This patch extends the idea of migratetype isolation at pageblock
> granularity posted earlier [3] by Johannes Weiner to an
> as-large-as-needed region to better support huge pages of bigger sizes
> and hardware TLB coalescing. We’re looking for feedback on the overall
> direction, particularly in relation to the recent THP allocator
> optimization proposal [4].
>
> The patches are based on 6.4 and are also available on github at
> https://github.com/magickaiyang/kernel-contiguous/tree/per_alloc_type_reclaim_counters_oct052023

Your reference links (1 to 4) are missing.

--
Best Regards,
Yan, Zi
Attachment:
signature.asc

Description: OpenPGP digital signature