On 3/14/23 02:49, Andrew Morton wrote:
On Mon, 13 Mar 2023 20:45:21 +0800 Yin Fengwei <fengwei.yin@xxxxxxxxx> wrote:
This series is trying to bring the batched rmap removing to
try_to_unmap_one(). It's expected that the batched rmap
removing bring performance gain than remove rmap per page.
This series reconstruct the try_to_unmap_one() from:
loop:
clear and update PTE
unmap one page
goto loop
to:
loop:
clear and update PTE
goto loop
unmap the range of folio in one call
It is one step to always map/unmap the entire folio in one call.
Which can simplify the folio mapcount handling by avoid dealing
with each page map/unmap.
...
For performance gain demonstration, changed the MADV_PAGEOUT not
to split the large folio for page cache and created a micro
benchmark mainly as following:
Please remind me why it's necessary to patch the kernel to actually
performance test this? And why it's proving so hard to demonstrate
benefits in real-world workloads?
(Yes, this was touched on in earlier discussion, but I do think these
considerations should be spelled out in the [0/N] changelog).
OK. What about add following in cover letter:
"
The performance gain of this series can be demonstrated with large
folio reclaim. In current kernel, vmscan() path will be benefited by
the changes. But there is no workload/benchmark can show the exact
performance gain for vmscan() path as far as I am aware.
Another way to demonstrate the performance benefit is using
MADV_PAGEOUT which can trigger page reclaim also. The problem is that
MADV_PAGEOUT always split the large folio because it's not aware of
large folio for page cache currently. To show the performance benefit,
MADV_PAGEOUT is updated not to split the large folio.
For long term with wider adoption of large folio in kernel (like large
folio for anonymous page), MADV_PAGEOUT needs be updated to handle
large folio as whole to avoid splitting it always.
"
Regards
Yin, Fengwei
Thanks.