On 19.03.21 18:04, Yang Shi wrote:
On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@xxxxxxxxxx> wrote:
On 11.03.21 22:26, Peter Xu wrote:
On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
I was wondering, is there any mechanism that reclaims basically empty page
tables in a running process?
Would munmap() count? :)
Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)
As so often lately, the use case is sparse memory mappings where we
a) may want to reuse the area later.
b) don't want to hold the mmap lock in write while optimizing
c) don't want to create a lot of individual mappings that we might not
be able to merge again.
Will the below work for you?
1. acquire write mmap lock
2. unlink vmas from the list and rbtree (so the vmas won't be visible
to any concurrent readers/writers)
3. downgrade write lock to read lock
4. zap page tables and free page tables
5. upgrade to write lock
6. relink vmas back to list and rbtree
Actually the current implementation of munmap() does the first 5 steps.
That's almost mmap(MAP_FIXED) for the cases where we can merge VMAs. But
I don't think this is actually what we want. We don't want to do such
optimizations while we're in mmap-read-locked MADV_DONTNEED etc.
Simple example: QEMU implements memory ballooning for its VMs via
virtio-balloon. When the guest inflates/deflates 4k pages and we're
using anonymous memory, we issue madvise(MADV_DONTNEED) syscalls for
each 4k page. At some point, we might be able to reclaim page tables -
but we don't want to suddenly take the mmap lock in write during
madvise() when there is no actual memory pressure, or scan for
optimization opportunities during every syscall. User space pretty much
relies on madvise(DONTNEED) being fast and little intrusive.
I think there might be other cases where we can reclaim page tables as
well, not necessarily triggered by user space. For example, after we
wrote back/evicted a sequence of file-mapped pages, I would assume that
we might also be able to reclaim page tables, but I haven't looked into
it yet. For now, I mostly care about page table reclaim for the cases
where we discard pages from page tables completely (MADV_DONTNEED,
MADV_FREE, MADV_REMOVE, fallocate(PUNCH_HOLE)).
I envision page table reclaim to happen asynchronously, either
periodically once under memory pressure, or once sufficient evidence is
there that reclaim might make sense. There, similarly to khugepaged, we
might have to temporarily take the mmap lock in write for a short period
in time, but I'll have to look into the details first.
--
Thanks,
David / dhildenb