On 3/3/23 10:26, Yin, Fengwei wrote: > > > On 3/2/2023 10:23 PM, David Hildenbrand wrote: >> On 02.03.23 14:32, Yin, Fengwei wrote: >>> >>> >>> On 3/2/2023 6:04 PM, David Hildenbrand wrote: >>>> On 01.03.23 02:44, Yin, Fengwei wrote: >>>>> On Tue, 2023-02-28 at 12:28 -0800, Andrew Morton wrote: >>>>>> On Tue, 28 Feb 2023 20:23:03 +0800 Yin Fengwei >>>>>> <fengwei.yin@xxxxxxxxx> wrote: >>>>>> >>>>>>> Testing done with the V2 patchset in a qemu guest >>>>>>> with 4G mem + 512M zram: >>>>>>> - kernel mm selftest to trigger vmscan() and final hit >>>>>>> try_to_unmap_one(). >>>>>>> - Inject hwpoison to hugetlb page to trigger try_to_unmap_one() >>>>>>> call against hugetlb. >>>>>>> - 8 hours stress testing: Firefox + kernel mm selftest + kernel >>>>>>> build. >>>>>> >>>>>> Was any performance testing done with these changes? >>>>> I tried to collect the performance data. But found out that it's >>>>> not easy to trigger try_to_unmap_one() path (the only one I noticed >>>>> is to trigger page cache reclaim). And I am not aware of a workload >>>>> can show it. Do you have some workloads suggsted to run? Thanks. >>>> >>>> If it happens barely, why care about performance and have a "398 insertions(+), 260 deletions(-)" ? >>> I mean I can't find workload to trigger page cache reclaim and measure >>> its performance. We can do "echo 1 > /proc/sys/vm/drop_caches" to reclaim >>> page cache. But there is no obvious indicator which shows the advantage >>> of this patchset. Maybe I could try eBPF to capture some statistic of >>> try_to_unmap_one()? >> >> If no workload/benchmark is affected (or simply corner cases where nobody cares about performance), I hope you understand that it's hard to argue why we should care about such an optimization then. > Yes. I understood this. > >> >> I briefly thought that page migration could benefit, but it always uses try_to_migrate(). > Yes. try_to_migrate() shared very similar logic with try_to_unmap_one(). Same batched > operation apply to try_to_migrate() also. > >> >> So I guess we're fairly limited to vmscan (memory failure is a corner cases). > Agree. > >> >> I recall that there are some performance-sensitive swap-to-nvdimm test cases. As an alternative, one could eventually write a microbenchmark that measures MADV_PAGEOUT performance -- it should also end up triggering vmscan, but only if the page is mapped exactly once (in which case, I assume batch removal doesn't really help ?). > Yes. MADV_PAGEOUT can trigger vmscan. My understanding is that only one map > also could benefit from the batched operation also. Let me try to have > a microbenchmark based on MADV_PAGEOUT and see what we could get. Thanks. Checked the MADV_PAGEOUT, it can't benefit from this series because the large folio is split. I suppose we will further update MADV_PAGEOUT to support reclaim large folio without splitting it later. Regards Yin, Fengwei > > > Regards > Yin, Fengwei > >>