On Fri, Dec 1, 2023 at 9:23 AM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 28.11.23 13:50, Weixi Zhu wrote: > > This patch adds an abstraction layer, struct vm_object, that maintains > > per-process virtual-to-physical mapping status stored in struct gm_mapping. > > For example, a virtual page may be mapped to a CPU physical page or to a > > device physical page. Struct vm_object effectively maintains an > > arch-independent page table, which is defined as a "logical page table". > > While arch-dependent page table used by a real MMU is named a "physical > > page table". The logical page table is useful if Linux core MM is extended > > to handle a unified virtual address space with external accelerators using > > customized MMUs. > > Which raises the question why we are dealing with anonymous memory at > all? Why not go for shmem if you are already only special-casing VMAs > with a MMAP flag right now? > > That would maybe avoid having to introduce controversial BSD design > concepts into Linux, that feel like going a step backwards in time to me > and adding *more* MM complexity. > > > > > In this patch, struct vm_object utilizes a radix > > tree (xarray) to track where a virtual page is mapped to. This adds extra > > memory consumption from xarray, but provides a nice abstraction to isolate > > mapping status from the machine-dependent layer (PTEs). Besides supporting > > accelerators with external MMUs, struct vm_object is planned to further > > union with i_pages in struct address_mapping for file-backed memory. > > A file already has a tree structure (pagecache) to manage the pages that > are theoretically mapped. It's easy to translate from a VMA to a page > inside that tree structure that is currently not present in page tables. > > Why the need for that tree structure if you can just remove anon memory > from the picture? > > > > > The idea of struct vm_object is originated from FreeBSD VM design, which > > provides a unified abstraction for anonymous memory, file-backed memory, > > page cache and etc[1]. > > :/ > > > Currently, Linux utilizes a set of hierarchical page walk functions to > > abstract page table manipulations of different CPU architecture. The > > problem happens when a device wants to reuse Linux MM code to manage its > > page table -- the device page table may not be accessible to the CPU. > > Existing solution like Linux HMM utilizes the MMU notifier mechanisms to > > invoke device-specific MMU functions, but relies on encoding the mapping > > status on the CPU page table entries. This entangles machine-independent > > code with machine-dependent code, and also brings unnecessary restrictions. > > Why? we have primitives to walk arch page tables in a non-arch specific > fashion and are using them all over the place. > > We even have various mechanisms to map something into the page tables > and get the CPU to fault on it, as if it is inaccessible (PROT_NONE as > used for NUMA balancing, fake swap entries). > > > The PTE size and format vary arch by arch, which harms the extensibility. > > Not really. > > We might have some features limited to some architectures because of the > lack of PTE bits. And usually the problem is that people don't care > enough about enabling these features on older architectures. > > If we ever *really* need more space for sw-defined data, it would be > possible to allocate auxiliary data for page tables only where required > (where the features apply), instead of crafting a completely new, > auxiliary datastructure with it's own locking. > > So far it was not required to enable the feature we need on the > architectures we care about. > > > > > [1] https://docs.freebsd.org/en/articles/vm-design/ > > In the cover letter you have: > > "The future plan of logical page table is to provide a generic > abstraction layer that support common anonymous memory (I am looking at > you, transparent huge pages) and file-backed memory." > > Which I doubt will happen; there is little interest in making anonymous > memory management slower, more serialized, and wasting more memory on > metadata. Also worth noting that: 1) Mach VM (which FreeBSD inherited, from the old BSD) vm_objects aren't quite what's being stated here, rather they are somewhat replacements for both anon_vma and address_space[1]. Very similarly to Linux, they take pages from vm_objects and map them in page tables using pmap (the big difference is anon memory, which has its bookkeeping in page tables, on Linux) 2) These vm_objects were a horrendous mistake (see CoW chaining) and FreeBSD has to go to horrendous lengths to make them tolerable. The UVM paper/dissertation (by Charles Cranor) talks about these issues at length, and 20 years later it's still true. 3) Despite Linux MM having its warts, it's probably correct to consider it a solid improvement over FreeBSD MM or NetBSD UVM And, finally, randomly tacking on core MM concepts from other systems is at best a *really weird* idea. Particularly when they aren't even what was stated! [1] If you really can't use PTEs, I don't see how you can't use file mappings and/or some vm_operations_struct workarounds, when the patch's vm_object is literally just an xarray with a different name -- Pedro