> On Oct 17, 2019, at 7:28 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: > > On 10/17, Song Liu wrote: >> >> >>> On Oct 17, 2019, at 1:47 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: >>> >>> On 10/16, Song Liu wrote: >>>> >>>>> On Oct 16, 2019, at 5:10 AM, Oleg Nesterov <oleg@xxxxxxxxxx> wrote: >>>>> >>>>>> @@ -489,6 +492,9 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, >>>>>> if (ret <= 0) >>>>>> goto put_old; >>>>>> >>>>>> + WARN(!is_register && PageCompound(old_page), >>>>>> + "uprobe unregister should never work on compound page\n"); >>>>> >>>>> But this can happen with the change above. You can't know if *vaddr was >>>>> previously changed by install_breakpoint() or not. >>>> >>>>> If not, verify_opcode() should likely save us, but we can't rely on it. >>>>> Say, someone can write "int3" into vm_file at uprobe->offset. >>>> >>>> I think this won't really happen. With is_register == false, we already >>>> know opcode is not "int3", so current call must be from set_orig_insn(). >>>> Therefore, old_page must be installed by uprobe, and cannot be compound. >>>> >>>> The other way is not guaranteed. With is_register == true, it is still >>>> possible current call is from set_orig_insn(). However, we do not rely >>>> on this path. >>> >>> Quite contrary. >>> >>> When is_register == true we know that a) the caller is install_breakpoint() >>> and b) the original insn is NOT int3 unless this page was alreadt COW'ed by >>> userspace, say, by gdb. >>> >>> If is_register == false we only know that the caller is remove_breakpoint(). >>> We can't know if this page was COW'ed by uprobes or userspace, we can not >>> know if the insn we are going to replace is int3 or not, thus we can not >>> assume that verify_opcode() will fail and save us. >> >> So the case we worry about is: >> old_page is COW by user space, > > no, in this case the page shouldn't be huge, > >> target insn is int3, and it is a huge page; >> then uprobe calls remove_breakpoint(); > > Yes, > >> Yeah, I guess this could happen. > > Yes, > >> For the fix, I guess return -Esomething in such case should be sufficient? > > this is what I tried to suggest from the very beginning. > Thanks Oleg! Attached is v2 of 4/4. Song ============================ 8< =============================== >From 0ab451f31b8fa6315d9de86eebe5d6c44dabac7e Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving@xxxxxx> Date: Tue, 15 Oct 2019 22:38:55 -0700 Subject: [PATCH v2 4/4] uprobe: only do FOLL_SPLIT_PMD for uprobe register Attaching uprobe to text section in THP splits the PMD mapped page table into PTE mapped entries. On uprobe detach, we would like to regroup PMD mapped page table entry to regain performance benefit of THP. However, the regroup is broken For perf_event based trace_uprobe. This is because perf_event based trace_uprobe calls uprobe_unregister twice on close: first in TRACE_REG_PERF_CLOSE, then in TRACE_REG_PERF_UNREGISTER. The second call will split the PMD mapped page table entry, which is not the desired behavior. Fix this by only use FOLL_SPLIT_PMD for uprobe register case. Add a WARN() to confirm uprobe unregister never work on huge pages, and abort the operation when this WARN() triggers. Fixes: 5a52c9df62b4 ("uprobe: use FOLL_SPLIT_PMD instead of FOLL_SPLIT") Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Cc: Srikar Dronamraju <srikar@xxxxxxxxxxxxxxxxxx> Cc: Oleg Nesterov <oleg@xxxxxxxxxx> Signed-off-by: Song Liu <songliubraving@xxxxxx> --- Changes v1 -> v2: Return -EINVAL if the WARN() triggers. (Oleg Nesterov) --- kernel/events/uprobes.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 94d38a39d72e..c74761004ee5 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -474,14 +474,17 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, struct vm_area_struct *vma; int ret, is_register, ref_ctr_updated = 0; bool orig_page_huge = false; + unsigned int gup_flags = FOLL_FORCE; is_register = is_swbp_insn(&opcode); uprobe = container_of(auprobe, struct uprobe, arch); retry: + if (is_register) + gup_flags |= FOLL_SPLIT_PMD; /* Read the page with vaddr into memory */ - ret = get_user_pages_remote(NULL, mm, vaddr, 1, - FOLL_FORCE | FOLL_SPLIT_PMD, &old_page, &vma, NULL); + ret = get_user_pages_remote(NULL, mm, vaddr, 1, gup_flags, + &old_page, &vma, NULL); if (ret <= 0) return ret; @@ -489,6 +492,12 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, if (ret <= 0) goto put_old; + if (WARN(!is_register && PageCompound(old_page), + "uprobe unregister should never work on compound page\n")) { + ret = -EINVAL; + goto put_old; + } + /* We are going to replace instruction, update ref_ctr. */ if (!ref_ctr_updated && uprobe->ref_ctr_offset) { ret = update_ref_ctr(uprobe, mm, is_register ? 1 : -1); -- 2.17.1