Re: [RFC PATCH v3 0/6] Removing limitations of merging anonymous VMAs

Vlastimil Babka <vbabka@xxxxxxx> · Fri, 20 May 2022 14:22:59 +0200

On 5/17/22 18:44, Kirill A. Shutemov wrote:
> On Mon, May 16, 2022 at 02:53:59PM +0200, Jakub Matěna wrote:
>> This is a series of patches that try to improve merge success rate when
>> VMAs are being moved, resized or otherwise modified.
>> 
>> Motivation
>> In the current kernel it is impossible to merge two anonymous VMAs
>> if one of them was moved. That is because VMA's page offset is
>> set according to the virtual address where it was created and in
>> order to merge two VMAs page offsets need to follow up.
>> Another problem when merging two faulted VMA's is their anon_vma. In
>> current kernel these anon_vmas have to be the one and the same.
>> Otherwise merge is again not allowed.
>> There are several places from which vma_merge() is called and therefore
>> several use cases that might profit from this upgrade. These include
>> mmap (that fills a hole between two VMAs), mremap (that moves VMA next
>> to another one or again perfectly fills a hole), mprotect (that modifies
>> protection and allows merging with a neighbor) and brk (that expands VMA
>> so that it is adjacent to a neighbor).
>> Missed merge opportunities increase the number of VMAs of a process
>> and in some cases can cause problems when a max count is reached.
> 
> Hm. You are talking about missed opportunities, but do you know any
> workload that would measurably benefit from the change?

We do know about a workload that originally inspired this investigation of
feasibility, but it's proprietary and will take a while to evaluate the
benefits there. We did hope that a public RFC could lead to discovering
others that also have a workload that would benefit, and might currently use
some userspace workarounds due to the existing limitations.

> The changes are not trivial. And rmap code is complex enough as it is.

True, it was one of the goals, to see how complex exactly it would be. And
an opportunity to better document related parts of mm as part of the master
thesis :)

> I expect common cases to get slower due to additional checks that do not
> result in more merges.

Stats so far have shown that merges that this enables did happen, only a few
percent cases didn't. Of course for many workloads the extra merges will not
bring much benefit. One possibility is to introduce an opt-in mode (prctl or
madvise?) for workloads that know they would benefit.

> I donno, the effort looks dubious to me as of now.
 At least patches 1+2 could be considered immediately, as they don't bring
extra complexity.

A related issue which was brought to our attention is that current mremap()
implementation doesn't work on a range that spans multiple vma's. The
multiple vma's may be result of the current insufficient merging, or
otherwise. And it's tedious for userspace to discover the boundaries from
/proc/pid/maps to guide a mremap() vma by vma. More sucessful merging would
thus help, but it should be also possible to improve the mremap()
implementation, which shouldn't be as complex...