Re: [PATCH v6 18/18] arm64/mm: Automatically fold contpte mappings

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/6/25 15:23, Baolin Wang wrote:


On 2024/6/25 11:16, Kefeng Wang wrote:


On 2024/6/24 23:56, Ryan Roberts wrote:
+ Baolin Wang and Yin Fengwei, who maybe able to help with this.


Hi Kefeng,

Thanks for the report!


On 24/06/2024 15:30, Kefeng Wang wrote:
Hi Ryan,

A big regression on page-fault3("Separate file shared mapping page
fault") testcase from will-it-scale on arm64, no issue on x86,

./page_fault3_processes -t 128 -s 5

I see that this program is mkstmp'ing a file at "/tmp/willitscale.XXXXXX". Based on your description, I'm inferring that /tmp is backed by ext4 with your large
folio patches enabled?

Yes, mount /tmp by ext4, sorry to forget to mention that.



1) large folio disabled on ext4:
    92378735
2) large folio  enabled on ext4 +  CONTPTE enabled
    16164943
3) large folio  enabled on ext4 +  CONTPTE disabled
    80364074
4) large folio  enabled on ext4 +  CONTPTE enabled + large folio mapping enabled
in finish_fault()[2]
    299656874

We found *contpte_convert* consume lots of CPU(76%) in case 2),

contpte_convert() is expensive and to be avoided; In this case I expect it is repainting the PTEs with the PTE_CONT bit added in, and to do that it needs to invalidate the tlb for the virtual range. The code is there to mop up user space patterns where each page in a range is temporarily made RO, then later changed back. In this case, we want to re-fold the contpte range once all pages have
been serviced in RO mode.

Of course this path is only intended as a fallback, and the more optimium approach is to set_ptes() the whole folio in one go where possible - kind of
what you are doing below.

and disappeared
by following change[2], it is easy to understood the different between case 2)
and case 4) since case 2) always map one page
size, but always try to fold contpte mappings, which spend a lot of
time. Case 4) is a workaround, any other better suggestion?

See below.


Thanks.

[1] https://github.com/antonblanchard/will-it-scale
[2] enable large folio mapping in finish_fault()

diff --git a/mm/memory.c b/mm/memory.c
index 00728ea95583..5623a8ce3a1e 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4880,7 +4880,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
          * approach also applies to non-anonymous-shmem faults to avoid
          * inflating the RSS of the process.
          */
-       if (!vma_is_anon_shmem(vma) || unlikely(userfaultfd_armed(vma))) {
+       if (unlikely(userfaultfd_armed(vma))) {

The change to make finish_fault() handle multiple pages in one go are new; added by Baolin Wang at [1]. That extra conditional that you have removed is there to
prevent RSS reporting bloat. See discussion that starts at [2].

Anyway, it was my vague understanding that the fault around mechanism
(do_fault_around()) would ensure that (by default) 64K worth of pages get mapped
together in a single set_ptes() call, via filemap_map_pages() ->
filemap_map_folio_range(). Looking at the code, I guess fault around only
applies to read faults. This test is doing a write fault.

I guess we need to do a change a bit like what you have done, but also taking
into account fault_around configuration?

For the writable mmap() of tmpfs, we will use mTHP interface to control the size of folio to allocate, as discussed in previous meeting [1], so I don't think fault_around configuration will be helpful for tmpfs.

Yes, tmpfs is different from ext4.


For other filesystems, like ext4, I did not found the logic to determin what size of folio to allocate in writable mmap() path (Kefeng, please correct me if I missed something). If there is a control like mTHP, we can rely on that instead of 'fault_around'?

For ext4 or most filesystems, the folio is allocated from filemap_fault(),
we don't have explicit interface like mTHP to control the folio size.



[1] https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@xxxxxxxxxx/

Yes, the current changes is not enough, I hint some issue and still debugging, so our direction is trying to map large folio for do_shared_fault(), right?

I think this is the right direction to do. I add this '!vma_is_anon_shmem(vma)' conditon to gradually implement support for large folio mapping buidling, especially for writable mmap() support in tmpfs.

[1]
https://lore.kernel.org/all/3a190892355989d42f59cf9f2f98b94694b0d24d.1718090413.git.baolin.wang@xxxxxxxxxxxxxxxxx/
[2] https://lore.kernel.org/linux-mm/13939ade-a99a-4075-8a26-9be7576b7e03@xxxxxxx/




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux