Re: Page zapping and page table reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 19.03.21 18:04, Yang Shi wrote:
On Thu, Mar 11, 2021 at 1:35 PM David Hildenbrand <david@xxxxxxxxxx> wrote:

On 11.03.21 22:26, Peter Xu wrote:
On Thu, Mar 11, 2021 at 07:14:02PM +0100, David Hildenbrand wrote:
I was wondering, is there any mechanism that reclaims basically empty page
tables in a running process?

Would munmap() count? :)

Haha, no -- also not mmap(FIXED) or mremap(FIXED) ;)

As so often lately, the use case is sparse memory mappings where we

a) may want to reuse the area later.
b) don't want to hold the mmap lock in write while optimizing
c) don't want to create a lot of individual mappings that we might not
be able to merge again.

Will the below work for you?

1. acquire write mmap lock
2. unlink vmas from the list and rbtree (so the vmas won't be visible
to any concurrent readers/writers)
3. downgrade write lock to read lock
4. zap page tables and free page tables
5. upgrade to write lock
6. relink vmas back to list and rbtree

Actually the current implementation of munmap() does the first 5 steps.

That's almost mmap(MAP_FIXED) for the cases where we can merge VMAs. But I don't think this is actually what we want. We don't want to do such optimizations while we're in mmap-read-locked MADV_DONTNEED etc.


Simple example: QEMU implements memory ballooning for its VMs via virtio-balloon. When the guest inflates/deflates 4k pages and we're using anonymous memory, we issue madvise(MADV_DONTNEED) syscalls for each 4k page. At some point, we might be able to reclaim page tables - but we don't want to suddenly take the mmap lock in write during madvise() when there is no actual memory pressure, or scan for optimization opportunities during every syscall. User space pretty much relies on madvise(DONTNEED) being fast and little intrusive.

I think there might be other cases where we can reclaim page tables as well, not necessarily triggered by user space. For example, after we wrote back/evicted a sequence of file-mapped pages, I would assume that we might also be able to reclaim page tables, but I haven't looked into it yet. For now, I mostly care about page table reclaim for the cases where we discard pages from page tables completely (MADV_DONTNEED, MADV_FREE, MADV_REMOVE, fallocate(PUNCH_HOLE)).


I envision page table reclaim to happen asynchronously, either periodically once under memory pressure, or once sufficient evidence is there that reclaim might make sense. There, similarly to khugepaged, we might have to temporarily take the mmap lock in write for a short period in time, but I'll have to look into the details first.

--
Thanks,

David / dhildenb






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux