Hi all, Since time immemorial the kernel has maintained two separate realms within mm - that of file-backed mappings and that of anonymous mappings. Each of these require a reverse mapping from folio to VMA, utilising interval trees from an intermediate object referenced by folio->mapping back to the VMAs which map it. In the case of a file-backed mapping, this 'intermediate object' is the shared page cache entry, of type struct address_space. It is non-CoW which keep things simple(-ish) and the concept is straight-forward - both the folio and the VMAs which map the page cache object reference it. In the case of anonymous memory, things are not quite as simple, as a result of CoW. This is further complicated by forking and the very many different combinations of CoW'd and non-CoW'd folios that can exist within a mapping. This kind of mapping utilises struct anon_vma objects which as a result of this complexity are pretty well entirely concerned with maintaining the notion of an anon_vma object rather than describing the underlying memory in any way. Of course we can enter further realms of insan^W^W^W^W^Wcomplexity by maintaining a MAP_PRIVATE file-backed mapping where we can experience both at once! The fact that we can have both CoW'd and non-CoW'd folios referencing a VMA means that we require -yet another- type, a struct anon_vma_chain, maintained on a linked list, to abstract the link between anon_vma objects and VMAs, and to provide a means by which one can manage and traverse anon_vma objects from the VMA as well as looking them up from the reverse mapping. Maintaining all of this correctly is very fragile, error-prone and confusing, not to mention the concerns around maintaining correct locking semantics, correctly propagating anonymous VMA state on fork, and trying to reuse state to avoid allocating unnecessary memory to maintain all of this infrastructure. An additional consequence of maintaining these two realms is that that which straddles them - shmem - becomes something of an enigma - file-backed, but existing on the anonymous LRU list and requiring a lot of very specific handling. It is obvious that there is some isomorphism between the representation of file systems and anonymous memory, less the CoW handling. However there is a concept which exists within file systems which can somewhat bridge the gap - reflinks. A future where we unify anonymous and file-backed memory mappings would be one in which a reflinks were implemented at a general level rather than, as they are now, implemented individually within file systems. I'd like to discuss how feasible doing so might be, whether this is a sane line of thought at all, and how a roadmap for working towards the elimination of anon_vma as it stands might look. As with my other proposal, I will gather more concrete information before LSF to ensure the discussion is specific, and of course I would be interested to discuss the topic in this thread also! Thanks!