Re: [PATCH] x86/mm: Do not split_large_page() for set_kernel_text_rw()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Cc: Steven Rostedt and Suresh Siddha

Hi Peter, 

> On Aug 23, 2019, at 2:36 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> 
> On Thu, Aug 22, 2019 at 10:23:35PM -0700, Song Liu wrote:
>> As 4k pages check was removed from cpa [1], set_kernel_text_rw() leads to
>> split_large_page() for all kernel text pages. This means a single kprobe
>> will put all kernel text in 4k pages:
>> 
>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>  0xffffffff81000000-0xffffffff82400000     20M  ro    PSE      x  pmd
>> 
>>  root@ ~# echo ONE_KPROBE >> /sys/kernel/debug/tracing/kprobe_events
>>  root@ ~# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
>> 
>>  root@ ~# grep ffff81000000- /sys/kernel/debug/page_tables/kernel
>>  0xffffffff81000000-0xffffffff82400000     20M  ro             x  pte
>> 
>> To fix this issue, introduce CPA_FLIP_TEXT_RW to bypass "Text RO" check
>> in static_protections().
>> 
>> Two helper functions set_text_rw() and set_text_ro() are added to flip
>> _PAGE_RW bit for kernel text.
>> 
>> [1] commit 585948f4f695 ("x86/mm/cpa: Avoid the 4k pages check completely")
> 
> ARGH; so this is because ftrace flips the whole kernel range to RW and
> back for giggles? I'm thinking _that_ is a bug, it's a clear W^X
> violation.

Thanks for your comments. Yes, it is related to ftrace, as we have
CONFIG_KPROBES_ON_FTRACE. However, after digging around, I am not sure
what is the expected behavior. 

Kernel text region has two mappings to it. For x86_64 and four-level 
page table, there are: 

	1. kernel identity mapping, from 0xffff888000100000; 
	2. kernel text mapping, from 0xffffffff81000000, 

Per comments in arch/x86/mm/init_64.c:set_kernel_text_rw():

        /*
         * Make the kernel identity mapping for text RW. Kernel text
         * mapping will always be RO. Refer to the comment in
         * static_protections() in pageattr.c
         */
	set_memory_rw(start, (end - start) >> PAGE_SHIFT);

kprobe (with CONFIG_KPROBES_ON_FTRACE) should work on kernel identity
mapping. 

However, my experiment shows that kprobe actually operates on the 
kernel text mapping (0xffffffff81000000-). It is the same w/ and w/o 
CONFIG_KPROBES_ON_FTRACE. Therefore, I am not sure whether the comment
is out-dated (10-year old), or the kprobe is doing something wrong. 


More information about the issue we are looking at. 

We found with 5.2 kernel (no CONFIG_PAGE_TABLE_ISOLATION, w/ 
CONFIG_KPROBES_ON_FTRACE), a single kprobe will split _all_ PMDs in 
kernel text mapping into pte-mapped pages. This increases iTLB 
miss rate from about 300 per million instructions to about 700 per
million instructions (for the application I test with). 

Per bisect, we found this behavior happens after commit 585948f4f695 
("x86/mm/cpa: Avoid the 4k pages check completely"). That's why I 
proposed this PATCH to fix/workaround this issue. However, per
Peter's comment and my study of the code, this doesn't seem the 
real problem or the only here. 

I also tested that the PMD split issue doesn't happen w/o 
CONFIG_KPROBES_ON_FTRACE. 


In summary, I have the following questions:

1. Which mapping should kprobe work on? Kernel identity mapping or 
   kernel text mapping?
2. FTRACE causes split of PMD mapped kernel text. How should we fix
   this? 

Thanks,
Song








[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux