On 2024/4/19 0:32, Peter Xu wrote:
Hi, Kefeng,
On Thu, Apr 18, 2024 at 08:06:41PM +0800, Kefeng Wang wrote:
Add userfaultfd_wp() check in vmf_orig_pte_uffd_wp() to avoid the
unnecessary pte_marker_entry_uffd_wp() in most pagefault, difference
as shows below from perf data of lat_pagefault, note, the function
vmf_orig_pte_uffd_wp() is not inlined in the two kernel versions.
perf report -i perf.data.before | grep vmf
0.17% 0.13% lat_pagefault [kernel.kallsyms] [k] vmf_orig_pte_uffd_wp.part.0.isra.0
perf report -i perf.data.after | grep vmf
Any real number to share too besides the perf greps? I meant, even if perf
report will not report such function anymore, it doesn't mean it'll be
faster, and how much it improves?
dd if=/dev/zero of=/tmp/XXX bs=512M count=1
./lat_pagefault -W 5 -N 5 /tmp/XXX
before after
1 0.2623 0.2605
2 0.2622 0.2598
3 0.2621 0.2595
4 0.2622 0.2600
5 0.2651 0.2598
6 0.2624 0.2594
7 0.2624 0.2605
8 0.2627 0.2608
average 0.262675 0.2600375 -0.0026375
The lat_pagefault does show some improvement(also I reboot and retest,
the results are same).
Now we're switching from pte_marker_uffd_wp() check into a vma flag check.
I think it makes more sense to compare the number rather than the perf
reports, as the vma flag check instructions will be buried under other
entries IIUC.
Thanks,