Re: [PATCH v2 0/5] batched remove rmap in try_to_unmap_one()

Yin Fengwei <fengwei.yin@xxxxxxxxx> · Mon, 6 Mar 2023 17:11:20 +0800



On 3/3/23 10:26, Yin, Fengwei wrote:
> 
> 
> On 3/2/2023 10:23 PM, David Hildenbrand wrote:
>> On 02.03.23 14:32, Yin, Fengwei wrote:
>>>
>>>
>>> On 3/2/2023 6:04 PM, David Hildenbrand wrote:
>>>> On 01.03.23 02:44, Yin, Fengwei wrote:
>>>>> On Tue, 2023-02-28 at 12:28 -0800, Andrew Morton wrote:
>>>>>> On Tue, 28 Feb 2023 20:23:03 +0800 Yin Fengwei
>>>>>> <fengwei.yin@xxxxxxxxx> wrote:
>>>>>>
>>>>>>> Testing done with the V2 patchset in a qemu guest
>>>>>>> with 4G mem + 512M zram:
>>>>>>>     - kernel mm selftest to trigger vmscan() and final hit
>>>>>>>       try_to_unmap_one().
>>>>>>>     - Inject hwpoison to hugetlb page to trigger try_to_unmap_one()
>>>>>>>       call against hugetlb.
>>>>>>>     - 8 hours stress testing: Firefox + kernel mm selftest + kernel
>>>>>>>       build.
>>>>>>
>>>>>> Was any performance testing done with these changes?
>>>>> I tried to collect the performance data. But found out that it's
>>>>> not easy to trigger try_to_unmap_one() path (the only one I noticed
>>>>> is to trigger page cache reclaim). And I am not aware of a workload
>>>>> can show it. Do you have some workloads suggsted to run? Thanks.
>>>>
>>>> If it happens barely, why care about performance and have a "398 insertions(+), 260 deletions(-)" ?
>>> I mean I can't find workload to trigger page cache reclaim and measure
>>> its performance. We can do "echo 1 > /proc/sys/vm/drop_caches" to reclaim
>>> page cache. But there is no obvious indicator which shows the advantage
>>> of this patchset. Maybe I could try eBPF to capture some statistic of
>>> try_to_unmap_one()?
>>
>> If no workload/benchmark is affected (or simply corner cases where nobody cares about performance), I hope you understand that it's hard to argue why we should care about such an optimization then.
> Yes. I understood this.
> 
>>
>> I briefly thought that page migration could benefit, but it always uses try_to_migrate().
> Yes. try_to_migrate() shared very similar logic with try_to_unmap_one(). Same batched
> operation apply to try_to_migrate() also.
> 
>>
>> So I guess we're fairly limited to vmscan (memory failure is a corner cases).
> Agree.
> 
>>
>> I recall that there are some performance-sensitive swap-to-nvdimm test cases. As an alternative, one could eventually write a microbenchmark that measures MADV_PAGEOUT performance -- it should also end up triggering vmscan, but only if the page is mapped exactly once (in which case, I assume batch removal doesn't really help ?).
> Yes. MADV_PAGEOUT can trigger vmscan. My understanding is that only one map
> also could benefit from the batched operation also. Let me try to have
> a microbenchmark based on MADV_PAGEOUT and see what we could get. Thanks.
Checked the MADV_PAGEOUT, it can't benefit from this series because the large
folio is split. I suppose we will further update MADV_PAGEOUT to support reclaim
large folio without splitting it later.


Regards
Yin, Fengwei

> 
> 
> Regards
> Yin, Fengwei
> 
>>