Re: [RFC PATCH 2/6] mm/gmem: add arch-independent abstraction to track address mapping status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28.11.23 13:50, Weixi Zhu wrote:
This patch adds an abstraction layer, struct vm_object, that maintains
per-process virtual-to-physical mapping status stored in struct gm_mapping.
For example, a virtual page may be mapped to a CPU physical page or to a
device physical page. Struct vm_object effectively maintains an
arch-independent page table, which is defined as a "logical page table".
While arch-dependent page table used by a real MMU is named a "physical
page table". The logical page table is useful if Linux core MM is extended
to handle a unified virtual address space with external accelerators using
customized MMUs.

Which raises the question why we are dealing with anonymous memory at all? Why not go for shmem if you are already only special-casing VMAs with a MMAP flag right now?

That would maybe avoid having to introduce controversial BSD design concepts into Linux, that feel like going a step backwards in time to me and adding *more* MM complexity.


In this patch, struct vm_object utilizes a radix
tree (xarray) to track where a virtual page is mapped to. This adds extra
memory consumption from xarray, but provides a nice abstraction to isolate
mapping status from the machine-dependent layer (PTEs). Besides supporting
accelerators with external MMUs, struct vm_object is planned to further
union with i_pages in struct address_mapping for file-backed memory.

A file already has a tree structure (pagecache) to manage the pages that are theoretically mapped. It's easy to translate from a VMA to a page inside that tree structure that is currently not present in page tables.

Why the need for that tree structure if you can just remove anon memory from the picture?


The idea of struct vm_object is originated from FreeBSD VM design, which
provides a unified abstraction for anonymous memory, file-backed memory,
page cache and etc[1].

:/

Currently, Linux utilizes a set of hierarchical page walk functions to
abstract page table manipulations of different CPU architecture. The
problem happens when a device wants to reuse Linux MM code to manage its
page table -- the device page table may not be accessible to the CPU.
Existing solution like Linux HMM utilizes the MMU notifier mechanisms to
invoke device-specific MMU functions, but relies on encoding the mapping
status on the CPU page table entries. This entangles machine-independent
code with machine-dependent code, and also brings unnecessary restrictions.

Why? we have primitives to walk arch page tables in a non-arch specific fashion and are using them all over the place.

We even have various mechanisms to map something into the page tables and get the CPU to fault on it, as if it is inaccessible (PROT_NONE as used for NUMA balancing, fake swap entries).

The PTE size and format vary arch by arch, which harms the extensibility.

Not really.

We might have some features limited to some architectures because of the lack of PTE bits. And usually the problem is that people don't care enough about enabling these features on older architectures.

If we ever *really* need more space for sw-defined data, it would be possible to allocate auxiliary data for page tables only where required (where the features apply), instead of crafting a completely new, auxiliary datastructure with it's own locking.

So far it was not required to enable the feature we need on the architectures we care about.


[1] https://docs.freebsd.org/en/articles/vm-design/

In the cover letter you have:

"The future plan of logical page table is to provide a generic abstraction layer that support common anonymous memory (I am looking at you, transparent huge pages) and file-backed memory."

Which I doubt will happen; there is little interest in making anonymous memory management slower, more serialized, and wasting more memory on metadata.

Note that you won't make many friends around here with statements like "To be honest, not using a logical page table for anonymous memory is why Linux THP fails compared with FreeBSD's superpage".

I read one paper that makes such claims (I'm curious how you define "winning"), and am aware of some shortcomings. But I am not convinced that a second datastructure "is why Linux THP fails". It just requires some more work to get it sorted under Linux (e.g., allocate THP, PTE-map it and map inaccessible parts PROT_NONE, later collapse it in-place into a PMD), and so far, there was not a lot of interest that I am ware of to even start working on that.

So if there is not enough pain for companies to even work on that in Linux, maybe FreeBSD superpages are "winning" "on paper" only? Remember that the target audience here are Linux developers.


But yeah, this here is all designed around the idea "core MM is extended to handle a unified virtual address space with external accelerators using customized MMUs." and then trying to find other arguments why it's a good idea, without going too much into detail why it's all unsolvable without that.

The first thing to sort out if we even want that, and some discussions here already went into the direction of "likely not". Let's see.

--
Cheers,

David / dhildenb




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux