Re: [LSF/MM/BPF TOPIC] Userspace managed memory tiering

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 18, 2021 at 10:50 AM Wei Xu <weixugc@xxxxxxxxxx> wrote:
>
> In this proposal, I'd like to discuss userspace-managed memory tiering
> and the kernel support that it needs.
>
> New memory technologies and interconnect standard make it possible to
> have memory with different performance and cost on the same machine
> (e.g. DRAM + PMEM, DRAM + cost-optimized memory attached via CXL.mem).
> We can expect heterogeneous memory systems that have performance
> implications far beyond classical NUMA to become increasingly common
> in the future.  One of important use cases of such tiered memory
> systems is to improve the data center and cloud efficiency with
> better performance/TCO.
>
> Because different classes of applications (e.g. latency sensitive vs
> latency tolerant, high priority vs low priority) have different
> requirements, richer and more flexible memory tiering policies will
> be needed to achieve the desired performance target on a tiered
> memory system, which would be more effectively managed by a userspace
> agent, not by the kernel.  Moreover, we (Google) are explicitly trying
> to avoid adding a ton of heuristics to enlighten the kernel about the
> policy that we want on multi-tenant machines when the userspace offers
> more flexibility.
>
> To manage memory tiering in userspace, we need the kernel support in
> the three key areas:
>
> - resource abstraction and control of tiered memory;
> - API to monitor page accesses for making memory tiering decisions;
> - API to migrate pages (demotion/promotion).
>
> Userspace memory tiering can work on just NUMA memory nodes, provided
> that memory resources from different tiers are abstracted into
> separate NUMA nodes.  The userspace agent can create a tiering
> topology among these nodes based on their distances.
>
> An explicit memory tiering abstraction in the kernel is preferred,
> though, because it can not only allow the kernel to react in cases
> where it is challenging for userspace (e.g. reclaim-based demotion
> when the system is under DRAM pressure due to usage surge), but also
> enable tiering controls such as per-cgroup memory tier limits.
> This requirement is mostly aligned with the existing proposals [1]
> and [2].
>
> The userspace agent manages all migratable user memory on the system
> and this can be transparent from the point of view of applications.
> To demote cold pages and promote hot pages, the userspace agent needs
> page access information.  Because it is a system-wide tiering for user
> memory, the access information for both mapped and unmapped user pages
> is needed, and so are the physical page addresses.  A combination of
> page table accessed-bit scanning and struct page scanning should be
> needed.  Such page access monitoring should be efficient as well
> because the scans can be frequent. To return the page-level access
> information to the userspace, one proposal is to use tracepoint
> events. The userspace agent can then use BPF programs to collect such
> data and also apply customized filters when necessary.

Just FYI. There has been a project for userspace daemon. Please refer
to https://github.com/fengguang/memory-optimizer

We (Alibaba, when I was there) did some preliminary tests and
benchmarks with it. The accuracy was pretty good, but the cost was
relatively high. I agree with you that efficiency is the key. BPF may
be a good approach to improve the cost.

I'm not sure what the current status of this project is. You may reach
Huang Ying to get more information.

>
> The userspace agent can also make use of hardware PMU events, for
> which the existing kernel support should be sufficient.
>
> The third area is the API support for migrating pages. The existing
> move_pages() syscall can be a candidate, though it is virtual-address
> based and cannot migrate unmapped pages.  Is a physical-address based
> variant (e.g. move_pfns()), an acceptable proposal?
>
> [1] https://lore.kernel.org/lkml/9cd0dcde-f257-1b94-17d0-f2e24a3ce979@xxxxxxxxx/
> [2] https://lore.kernel.org/patchwork/cover/1408180/
>
> Thanks,
> Wei
>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux