Cc: Steven Rostedt and Suresh Siddha Hi Peter, > On Aug 23, 2019, at 2:36 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote: >> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to >> split_large_page() for all kernel text pages. This means a single kprobe >> will put all kernel text in 4k pages: >> >> root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel >> 0xffffffff81000000-0xffffffff82400000 20M ro PSE x pmd >> >> root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events >> root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable >> >> root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel >> 0xffffffff81000000-0xffffffff82400000 20M ro x pte >> >> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check >> in static_protections(). >> >> Two helper functions set_text_rw() and set_text_ro() are added to flip >> _PAGE_RW bit for kernel text. >> >> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely") > > ARGH; so this is because ftrace flips the whole kernel range to RW and > back for giggles? I'm thinking _that_ is a bug, it's a clear W^X > violation. Thanks for your comments. Yes, it is related to ftrace, as we have CONFIG_KPROBES_ON_FTRACE. However, after digging around, I am not sure what is the expected behavior. Kernel text region has two mappings to it. For x86_64 and four-level page table, there are: 1. kernel identity mapping, from 0xffff888000100000; 2. kernel text mapping, from 0xffffffff81000000, Per comments in arch/x86/mm/init_64.c:set_kernel_text_rw(): /* * Make the kernel identity mapping for text RW. Kernel text * mapping will always be RO. Refer to the comment in * static_protections() in pageattr.c */ set_memory_rw(start, (end - start) >> PAGE_SHIFT); kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity mapping. However, my experiment shows that kprobe actually operates on the kernel text mapping (0xffffffff81000000-). It is the same w/ and w/o CONFIG_KPROBES_ON_FTRACE. Therefore, I am not sure whether the comment is out-dated (10-year old), or the kprobe is doing something wrong. More information about the issue we are looking at. We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in kernel text mapping into pte-mapped pages. This increases iTLB miss rate from about 300 per million instructions to about 700 per million instructions (for the application I test with). Per bisect, we found this behavior happens after commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I proposed this PATCH to fix/workaround this issue. However, per Peter's comment and my study of the code, this doesn't seem the real problem or the only here. I also tested that the PMD split issue doesn't happen w/o CONFIG_KPROBES_ON_FTRACE. In summary, I have the following questions: 1. Which mapping should kprobe work on? Kernel identity mapping or kernel text mapping? 2. FTRACE causes split of PMD mapped kernel text. How should we fix this? Thanks, Song