Re: [PATCH v2] mm: memory: check userfaultfd_wp() in vmf_orig_pte_uffd_wp()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 19, 2024 at 11:00:46AM +0800, Kefeng Wang wrote:
> 
> 
> On 2024/4/19 0:32, Peter Xu wrote:
> > Hi, Kefeng,
> > 
> > On Thu, Apr 18, 2024 at 08:06:41PM +0800, Kefeng Wang wrote:
> > > Add userfaultfd_wp() check in vmf_orig_pte_uffd_wp() to avoid the
> > > unnecessary pte_marker_entry_uffd_wp() in most pagefault, difference
> > > as shows below from perf data of lat_pagefault, note, the function
> > > vmf_orig_pte_uffd_wp() is not inlined in the two kernel versions.
> > > 
> > >    perf report -i perf.data.before | grep vmf
> > >       0.17%     0.13%  lat_pagefault  [kernel.kallsyms]      [k] vmf_orig_pte_uffd_wp.part.0.isra.0
> > >    perf report -i perf.data.after  | grep vmf
> > 
> > Any real number to share too besides the perf greps?  I meant, even if perf
> > report will not report such function anymore, it doesn't mean it'll be
> > faster, and how much it improves?
> 
> dd if=/dev/zero of=/tmp/XXX bs=512M count=1
> ./lat_pagefault -W 5 -N 5 /tmp/XXX
> 
> 	before		after	
> 1	0.2623		0.2605	
> 2	0.2622		0.2598	
> 3	0.2621		0.2595	
> 4	0.2622		0.2600	
> 5	0.2651		0.2598	
> 6	0.2624		0.2594	
> 7	0.2624		0.2605	
> 8	0.2627		0.2608	
> average	0.262675	0.2600375	-0.0026375
> 
> The lat_pagefault does show some improvement(also I reboot and retest,
> the results are same).

Thanks. Could you replace the perf report with these real data?  Or just
append to it.

I had a look at the asm and indeed the current code will generate two
jumps when without this patch, and I don't know why..

   0x0000000000006ac4 <+52>:    test   $0x8,%ah                 <---- check FAULT_FLAG_ORIG_PTE_VALID
   0x0000000000006ac7 <+55>:    jne    0x6bcf <set_pte_range+319>
   0x0000000000006acd <+61>:    mov    0x18(%rbp),%rsi

   ...

   0x0000000000006bcf <+319>:   mov    0x40(%rdi),%rdi
   0x0000000000006bd3 <+323>:   test   $0xffffffffffffff9f,%rdi <---- pte_none() check
   0x0000000000006bda <+330>:   je     0x6acd <set_pte_range+61>
   0x0000000000006be0 <+336>:   test   $0x101,%edi              <---- pte_present() check
   0x0000000000006be6 <+342>:   jne    0x6acd <set_pte_range+61>
   0x0000000000006bec <+348>:   call   0x1c50 <pte_to_swp_entry>
   0x0000000000006bf1 <+353>:   mov    0x0(%rip),%rdx        # 0x6bf8 <set_pte_range+360>
   0x0000000000006bf8 <+360>:   mov    %rax,%r15
   0x0000000000006bfb <+363>:   shr    $0x3a,%rax
   0x0000000000006bff <+367>:   cmp    $0x1f,%rax
   0x0000000000006c03 <+371>:   mov    $0x0,%eax
   0x0000000000006c08 <+376>:   cmovne %rax,%r15
   0x0000000000006c0c <+380>:   mov    0x28(%rbx),%eax
   0x0000000000006c0f <+383>:   and    $0x1,%r15d
   0x0000000000006c13 <+387>:   jmp    0x6acd <set_pte_range+61>

I also don't know why the compiler cannot already merge the none+present
check into one shot, I thought it could.  Also surprised me that
pte_to_swp_entry() is a function call.. but not involved in this context.

So I think I was right it should bypass this when seeing it pte_none,
however that includes two jumps.

And with your patch applied the two jumps are not there:

   0x0000000000006b0c <+124>:   testb  $0x8,0x29(%r14)           <--- FAULT_FLAG_ORIG_PTE_VALID
   0x0000000000006b11 <+129>:   je     0x6b6a <set_pte_range+218>
   0x0000000000006b13 <+131>:   mov    (%r14),%rax
   0x0000000000006b16 <+134>:   testb  $0x10,0x21(%rax)          <--- userfaultfd_wp(vmf->vma) check
   0x0000000000006b1a <+138>:   je     0x6b6a <set_pte_range+218>

Maybe that's what contributes to that 0.x% extra time of a fault.

So if we do care about this 0.x% and we're doing this anyway, perhaps move
the vma check upper?  Because afaict FAULT_FLAG_ORIG_PTE_VALID should
always hit in set_pte_range(), so we can avoid two more insts in the common
paths.

I'll leave that to you too if you want to mention some details in above and
add that into the commit log.

Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux