Hi Tao, On 17/06/22 18:42, Tao Zhou wrote: > Hi Valentin, > > On Thu, Jun 16, 2022 at 01:37:09PM +0100, Valentin Schneider wrote: >> @@ -964,24 +966,31 @@ late_initcall(kexec_core_sysctl_init); >> */ >> void __noclone __crash_kexec(struct pt_regs *regs) >> { >> - /* Take the kexec_mutex here to prevent sys_kexec_load >> - * running on one cpu from replacing the crash kernel >> - * we are using after a panic on a different cpu. >> + /* >> + * This should be taking kexec_mutex before doing anything with the >> + * kexec_crash_image, but this code can be run in NMI context which >> + * means we can't even trylock. >> * >> - * If the crash kernel was not located in a fixed area >> - * of memory the xchg(&kexec_crash_image) would be >> - * sufficient. But since I reuse the memory... >> + * Pairs with smp_mb() in do_kexec_load() and sys_kexec_file_load() >> */ >> - if (mutex_trylock(&kexec_mutex)) { >> - if (kexec_crash_image) { >> - struct pt_regs fixed_regs; >> - >> - crash_setup_regs(&fixed_regs, regs); >> - crash_save_vmcoreinfo(); >> - machine_crash_shutdown(&fixed_regs); >> - machine_kexec(kexec_crash_image); >> - } >> - mutex_unlock(&kexec_mutex); >> + WRITE_ONCE(panic_wants_kexec, true); >> + smp_mb(); >> + /* >> + * If we're panic'ing while someone else is messing with the crash >> + * kernel, this isn't going to end well. >> + */ >> + if (READ_ONCE(kexec_loading)) { >> + WRITE_ONCE(panic_wants_kexec, false); >> + return; >> + } > > So this is from NMI. The mutex guarantee that kexec_file_load() or > do_kexec_load() just one of them beat on cpu. NMI can happen on more > than one cpu. That means that here be cumulativity here IMHO. > If you look at __crash_kexec() in isolation yes, but if you look at panic() and nmi_panic() only a single NMI can get in there. I think that is also true for invocations via crash_kexec(). > kexec_file_load()/ NMI0 NMI1.. > do_kexec_load() > > set kexec_loading=true > smp_mb() set panic_wants_kexec=ture > smp_mb() > see kexec_loading=ture and > conditionally set > panic_wants_kexec=false; > set panic_wants_kexec=ture > smp_mb() > see panic_wants_kexec=ture > conditionally set > kexec_loading=false > see kexec_loading=false > do kexec nmi things. > > You see conditionlly set kexec_loading or panic_wants_kexec there no barrier > there and if the cumulativity to have the effect there should be a acquire-release, > if I am not wrong. > > __crash_kexec(): > > WRITE_ONCE(panic_wants_kexec, true); > smp_mb(); > /* > * If we're panic'ing while someone else is messing with the crash > * kernel, this isn't going to end well. > */ > if (READ_ONCE(kexec_loading)) { > smp_store_release(panic_wants_kexec, false); > return; > } > > kexec_file_load()/do_kexec_load(): > > WRITE_ONCE(kexec_loading, true); > smp_mb(); > if (smp_load_acquire(panic_wants_kexec)) { > WRITE_ONCE(kexec_loading, false); > ... > } > > For those input, I'm sure I lost and feel hot.. > I thought that change the patten to load-store and set initial > value but failed. > I'm not sure if further ordering is required here, the base case being WRITE_ONCE(panic_wants_kexec, true); WRITE_ONCE(kexec_loading); smp_mb(); smp_mb(); v0 = READ_ONCE(kexec_loading); v1 = READ_ONCE(panic_wants_kexec); (see SB+fencembonceonces litmus test) Wich ensures (!v0 && !v1) is never true. If modified to: WRITE_ONCE(panic_wants_kexec, true); WRITE_ONCE(kexec_loading); smp_mb(); smp_mb(); v0 = READ_ONCE(kexec_loading); v1 = READ_ONCE(panic_wants_kexec); if (v0) if (v1) WRITE_ONCE(panic_wants_kexec, false); WRITE_ONCE(kexec_loading, false); then "(!v0 && !v1) is never true" still holds, so the exclusivity is maintained AFAICT.