On Fri, 20 Jul 2018, Thomas Gleixner wrote: > On Fri, 20 Jul 2018, Andy Lutomirski wrote: > > On Fri, Jul 20, 2018 at 12:27 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > On Fri, 20 Jul 2018, Andy Lutomirski wrote: > > >> > On Jul 20, 2018, at 6:22 AM, Joerg Roedel <joro@xxxxxxxxxx> wrote: > > >> > > > >> > From: Joerg Roedel <jroedel@xxxxxxx> > > >> > > > >> > The ring-buffer is accessed in the NMI handler, so we better > > >> > avoid faulting on it. Sync the vmalloc range with all > > >> > page-tables in system to make sure everyone has it mapped. > > >> > > > >> > This fixes a WARN_ON_ONCE() that can be triggered with PTI > > >> > enabled on x86-32: > > >> > > > >> > WARNING: CPU: 4 PID: 0 at arch/x86/mm/fault.c:320 vmalloc_fault+0x220/0x230 > > >> > > > >> > This triggers because with PTI enabled on an PAE kernel the > > >> > PMDs are no longer shared between the page-tables, so the > > >> > vmalloc changes do not propagate automatically. > > >> > > >> It seems like it would be much more robust to fix the vmalloc_fault() > > >> code instead. > > > > > > Right, but now the obvious fix for the issue at hand is this. We surely > > > should revisit this. > > > > If you commit this under this reasoning, then please at least make it say: > > > > /* XXX: The vmalloc_fault() code is buggy on PTI+PAE systems, and this > > is a workaround. */ > > > > Let's not have code in the kernel that pretends to make sense but is > > actually voodoo magic that works around bugs elsewhere. It's no fun > > to maintain down the road. > > Fair enough. Lemme amend it. Joerg is looking into it, but I surely want to > get that stuff some exposure in next ASAP. Delta patch below. Thanks. tglx 8<------------- --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -815,8 +815,12 @@ static void rb_free_work(struct work_str vfree(base); kfree(rb); - /* Make sure buffer is unmapped in all page-tables */ - vmalloc_sync_all(); + /* + * FIXME: PAE workaround for vmalloc_fault(): Make sure buffer is + * unmapped in all page-tables. + */ + if (IS_ENABLED(CONFIG_X86_PAE)) + vmalloc_sync_all(); } void rb_free(struct ring_buffer *rb) @@ -844,11 +848,13 @@ struct ring_buffer *rb_alloc(int nr_page goto fail_all_buf; /* - * The buffer is accessed in NMI handlers, make sure it is - * mapped in all page-tables in the system so that we don't - * fault on the range in an NMI handler. + * FIXME: PAE workaround for vmalloc_fault(): The buffer is + * accessed in NMI handlers, make sure it is mapped in all + * page-tables in the system so that we don't fault on the range in + * an NMI handler. */ - vmalloc_sync_all(); + if (IS_ENABLED(CONFIG_X86_PAE)) + vmalloc_sync_all(); rb->user_page = all_buf; rb->data_pages[0] = all_buf + PAGE_SIZE;