On 11.03.21 19:14, David Hildenbrand wrote:
Hi folks,
I was wondering, is there any mechanism that reclaims basically empty
page tables in a running process?
Like: When I MADV_DONTNEED a huge range, there could be plenty of
basically empty (e.g., all entries invalid) page tables we could
reclaim. As soon as we zap a complete PMD we could reclaim (depending on
the architecture) a whole page.
Zapping on the PMD level might make most impact I guess.
For 1 GB, we need 262144 4k pages. If we assume each PTE is 8 bytes, we
need a total of 8 MB for the lowest level page tables (PTE).
OTOH, we would need 512 PMD entries - a single 4k page. Zapping 1 TB
would mean we can free up another 4MB - rather a corner case and we can
live with that.
Of course, the same might apply to other cases where we can restore all
page table content from the VMA again. One example would be after
MADV_FREE zapped a whole range of entries we marked.
Looks like if we happen to zap a THP, we should already get what we want
(no page table, nothing to remove)
I haven't immediately stumbled over anything, but could be I am missing
the obvious. I guess what would need some thought is concurrent
discards/pagefaults - but it feels like being similar to
collapsing/splitting a THP while there is other system activity.
Maybe there is already something and I am just not aware of it.
Thanks!
Thanks for the feedback so far. I just did a very simple experiment:
1. Start a VM (QEMU) with 60 GB and populate/preallocate all page tables.
2. Inflate the memory balloon (virtio-balloon) in the VM to 58 GB
3. Wait until fully inflated
Before inflating the balloon: PageTables: 131760 kB
After inflating the balloon: No real change
Shutting down the VM: PageTables: 8064 kB
In comparison, starting a 2 GB VM and preallocating/populating all
memory: PageTables: 12660 kB
So in this case, there is quite some room for improvements (> 100 MiB).
virtio-balloon will discard in 4k granularity, which means, that we'll
never get to zap whole THPs (the first discard will break up the THP),
therefore, don't remove any page tables.
I'll try identifying other workloads/cases where such an optimization
are applicable and work on asynchronous page table reclaim. Thanks!
--
Thanks,
David / dhildenb