On Mon, Nov 26, 2018 at 9:04 AM Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote: > > So I triggered a bug in x86 code. First the "okay" backtrace: > |BUG: GPF in non-whitelisted uaccess (non-canonical address?) > |general protection fault: 0000 [#1] PREEMPT SMP NOPTI > |CPU: 26 PID: 2236 Comm: sig-xstate-bum Not tainted 4.20.0-rc3 #45 > |RIP: 0010:__fpu__restore_sig+0x1c1/0x540 > |Call Trace: > | fpu__restore_sig+0x28/0x40 > | restore_sigcontext+0x13a/0x180 > | __ia32_sys_rt_sigreturn+0xae/0x100 > | do_syscall_64+0x4f/0x100 > | entry_SYSCALL_64_after_hwframe+0x44/0xa9 > |RIP: 0033:0x7f9b06aea227 > |---[ end trace a45ac23b593e9ab0 ]--- This bug got handled by Jann Horn, yes? (I remember seeing a related thread go by...) > > and now the not okay part: > > |BUG: sleeping function called from invalid context at kernel/sched/completion.c:99 > |in_atomic(): 1, irqs_disabled(): 1, pid: 2236, name: sig-xstate-bum > |Preemption disabled at: > |[<ffffffff99d60512>] pstore_dump+0x72/0x330 > |CPU: 26 PID: 2236 Comm: sig-xstate-bum Tainted: G D 4.20.0-rc3 #45 > |Call Trace: > | dump_stack+0x4f/0x6a > | ___might_sleep.cold.91+0xd3/0xe4 > | __might_sleep+0x50/0x90 > | wait_for_completion+0x32/0x130 > | virt_efi_query_variable_info+0x14e/0x160 > | efi_query_variable_store+0x51/0x1a0 > | efivar_entry_set_safe+0xa3/0x1b0 > | efi_pstore_write+0x109/0x140 Eww. :) > | pstore_dump+0x11c/0x330 > | kmsg_dump+0xa4/0xd0 > | oops_exit+0x22/0x30 > | oops_end+0x67/0xd0 > | die+0x41/0x4a > | do_general_protection+0xc1/0x150 > | general_protection+0x1e/0x30 > |RIP: 0010:__fpu__restore_sig+0x1c1/0x540 > | fpu__restore_sig+0x28/0x40 > | restore_sigcontext+0x13a/0x180 > | __ia32_sys_rt_sigreturn+0xae/0x100 > | do_syscall_64+0x4f/0x100 > | entry_SYSCALL_64_after_hwframe+0x44/0xa9 > |RIP: 0033:0x7f9b06aea227 > … [ moar backtrace ] > > This looks like it comes from commit 21b3ddd39fee ("efi: Don't use > spinlocks for efi vars") because it replaced a spin_lock() with > down_interruptible() in this case. In this case, since pstore_dump() > holds psinfo->buf_lock with disabled interrupts we can't block… > > I have this as a workaround: I'm not sure this is correct... > diff --git a/drivers/firmware/efi/vars.c b/drivers/firmware/efi/vars.c > index 9336ffdf6e2c..d4dcfe46eb2e 100644 > --- a/drivers/firmware/efi/vars.c > +++ b/drivers/firmware/efi/vars.c > @@ -755,7 +755,9 @@ int efivar_entry_set_safe(efi_char16_t *name, efi_guid_t vendor, u32 attributes, > return -EINTR; > } > > - status = check_var_size(attributes, size + ucs2_strsize(name, 1024)); > + status = ops->query_variable_store(attributes, > + size + ucs2_strsize(name, 1024), > + block); Why does this change help? > if (status != EFI_SUCCESS) { > up(&efivars_lock); > return -ENOSPC; > diff --git a/fs/pstore/platform.c b/fs/pstore/platform.c > index b821054ca3ed..9aa27a2c8d36 100644 > --- a/fs/pstore/platform.c > +++ b/fs/pstore/platform.c > @@ -127,10 +127,10 @@ static const char *get_reason_str(enum kmsg_dump_reason reason) > bool pstore_cannot_block_path(enum kmsg_dump_reason reason) > { > /* > - * In case of NMI path, pstore shouldn't be blocked > - * regardless of reason. > + * In case of disabled preemption / interrupts we can't block on any > + * lock regardless of reason. > */ > - if (in_nmi()) > + if (in_atomic() || irqs_disabled()) > return true; > > switch (reason) { I'd like to avoid any case where pstore might "miss" a crash report, though... this seems to make it easier for it to fail to record a crash, yes? I have some pending clean-ups in this area (buf_lock may not be needed at all, actually), so I'll try to work through those too. -Kees -- Kees Cook