Seiji Aguchi <seiji.aguchi at hds.com> writes: > Hi, > > I'm Seiji Aguchi. > I work for Hitachi Data Systems. > It's a first time to send a patch to lkml. > Nice to meet you. > > I found an issue in kexec. > Please give me your comments and suggestions. > > Kexec abort when two cpus panic at the same time. > An example scenario: > 1. Two cpus panic at the same time . > 2. One cpu ,cpu0, get kexec_mutex in crash_kexec(). > 3. The other cpu ,cpu1, can't get kexec_mutex and return from crash_kexec(). > 4. Cpu0 runs kmsg_dump(KMSG_DUMP_KEXEC). > 5. Cpu1 can't get dump_list_lock and return from kmsg_dump(KMSG_DUMP_PANIC). > 6. Cpu1 runs smp_send_stop() in panic() and sends IPI to other cpus. > 7. Cpu0 may receive IPI from cpu1 while running kmsg_dump(KMSG_DUMP_KEXEC), > crash_setup_regs(), or crash_save_vmcore(). > > We can solve this issue by disabling external interrupt while getting kexec_mutex > in crash_kexec(). Disabling interrupts is fine, I thought we did that already at some point. However that call to kmsg_dump(KMSG_DUMP_KEXEC) is a bug as it introduces locks into a path that should not be taking locks. Please remove that broken kmsg_dump call as well. Nothing in the crash_kexec path should even have the option of blocking. Eric