On 5/17/22 18:44, Kirill A. Shutemov wrote: > On Mon, May 16, 2022 at 02:53:59PM +0200, Jakub Matěna wrote: >> This is a series of patches that try to improve merge success rate when >> VMAs are being moved, resized or otherwise modified. >> >> Motivation >> In the current kernel it is impossible to merge two anonymous VMAs >> if one of them was moved. That is because VMA's page offset is >> set according to the virtual address where it was created and in >> order to merge two VMAs page offsets need to follow up. >> Another problem when merging two faulted VMA's is their anon_vma. In >> current kernel these anon_vmas have to be the one and the same. >> Otherwise merge is again not allowed. >> There are several places from which vma_merge() is called and therefore >> several use cases that might profit from this upgrade. These include >> mmap (that fills a hole between two VMAs), mremap (that moves VMA next >> to another one or again perfectly fills a hole), mprotect (that modifies >> protection and allows merging with a neighbor) and brk (that expands VMA >> so that it is adjacent to a neighbor). >> Missed merge opportunities increase the number of VMAs of a process >> and in some cases can cause problems when a max count is reached. > > Hm. You are talking about missed opportunities, but do you know any > workload that would measurably benefit from the change? We do know about a workload that originally inspired this investigation of feasibility, but it's proprietary and will take a while to evaluate the benefits there. We did hope that a public RFC could lead to discovering others that also have a workload that would benefit, and might currently use some userspace workarounds due to the existing limitations. > The changes are not trivial. And rmap code is complex enough as it is. True, it was one of the goals, to see how complex exactly it would be. And an opportunity to better document related parts of mm as part of the master thesis :) > I expect common cases to get slower due to additional checks that do not > result in more merges. Stats so far have shown that merges that this enables did happen, only a few percent cases didn't. Of course for many workloads the extra merges will not bring much benefit. One possibility is to introduce an opt-in mode (prctl or madvise?) for workloads that know they would benefit. > I donno, the effort looks dubious to me as of now. At least patches 1+2 could be considered immediately, as they don't bring extra complexity. A related issue which was brought to our attention is that current mremap() implementation doesn't work on a range that spans multiple vma's. The multiple vma's may be result of the current insufficient merging, or otherwise. And it's tedious for userspace to discover the boundaries from /proc/pid/maps to guide a mremap() vma by vma. More sucessful merging would thus help, but it should be also possible to improve the mremap() implementation, which shouldn't be as complex...