Re: [PATCH v2 6/7] x86: mm: free page table pages by RCU instead of semi RCU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2024/11/8 15:38, Qi Zheng wrote:
Hi Jann,

On 2024/11/8 06:39, Jann Horn wrote:
+x86 MM maintainers - x86@xxxxxxxxxx was already cc'ed, but I don't
know if that is enough for them to see it, and I haven't seen them
comment on this series yet; I think you need an ack from them for this
change.

Yes, thanks to Jann for cc-ing x86 MM maintainers, and look forward to
their feedback!


On Thu, Oct 31, 2024 at 9:14 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:
Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, the page table pages
will be freed by semi RCU, that is:

  - batch table freeing: asynchronous free by RCU
  - single table freeing: IPI + synchronous free

In this way, the page table can be lockless traversed by disabling IRQ in paths such as fast GUP. But this is not enough to free the empty PTE page
table pages in paths other that munmap and exit_mmap path, because IPI
cannot be synchronized with rcu_read_lock() in pte_offset_map{_lock}().

In preparation for supporting empty PTE page table pages reclaimation,
let single table also be freed by RCU like batch table freeing. Then we
can also use pte_offset_map() etc to prevent PTE page from being freed.

I applied your series locally and followed the page table freeing path
that the reclaim feature would use on x86-64. Looks like it goes like
this with the series applied:

Yes.


free_pte
   pte_free_tlb
     __pte_free_tlb
       ___pte_free_tlb
         paravirt_tlb_remove_table
           tlb_remove_table [!CONFIG_PARAVIRT, Xen PV, Hyper-V, KVM]
             [no-free-memory slowpath:]
               tlb_table_invalidate
               tlb_remove_table_one
                 tlb_remove_table_sync_one [does IPI for GUP-fast]

            ^
            It seems that this step can be ommitted when
            CONFIG_PT_RECLAIM is enabled, because GUP-fast will
            disable IRQ, which can also serve as the RCU critical
            section.

                 __tlb_remove_table_one [frees via RCU]
             [fastpath:]
               tlb_table_flush
                 tlb_remove_table_free [frees via RCU]
           native_tlb_remove_table [CONFIG_PARAVIRT on native]
             tlb_remove_table [see above]

Basically, the only remaining case in which
paravirt_tlb_remove_table() does not use tlb_remove_table() with RCU
delay is !CONFIG_PARAVIRT && !CONFIG_PT_RECLAIM. Given that
CONFIG_PT_RECLAIM is defined as "default y" when supported, I guess
that means X86's direct page table freeing path will almost never be
used? If it stays that way and the X86 folks don't see a performance
impact from using RCU to free tables on munmap() / process exit, I
guess we might want to get rid of the direct page table freeing path
on x86 at some point to simplify things...

In theory, using RCU to asynchronously free PTE pages should make
munmap() / process exit path faster. I can try to grab some data.


I ran 'stress-ng --mmap 1 --mmap-bytes 1G', and grabbed the data with
bpftrace like this:

bpftrace -e 'tracepoint:syscalls:sys_enter_munmap /comm == "stress-ng"/{@start[tid] = nsecs;} tracepoint:syscalls:sys_exit_munmap /@start[tid]/ { @ns[comm] = hist(nsecs - @start[tid]); delete(@start[tid]); } interval:s:1 {exit();}'

The results are as follows:

without patch:

@ns[stress-ng]:
[1K, 2K) 99566 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2K, 4K) 77756 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4K, 8K) 32545 |@@@@@@@@@@@@@@@@ | [8K, 16K) 442 | | [16K, 32K) 69 | | [32K, 64K) 1 | | [64K, 128K) 1 | | [128K, 256K) 14 | | [256K, 512K) 14 | | [512K, 1M) 68 | |

with patch:

@ns[stress-ng]:
[512, 1K) 69 | | [1K, 2K) 53921 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2K, 4K) 47088 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [4K, 8K) 20583 |@@@@@@@@@@@@@@@@@@@ | [8K, 16K) 659 | | [16K, 32K) 93 | | [32K, 64K) 24 | | [64K, 128K) 14 | | [128K, 256K) 6 | | [256K, 512K) 10 | | [512K, 1M) 29 | |

It doesn't seem to have much effect on munmap.

Thanks,
Qi





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux