[Added more people in cc] On 10/08/23 at 12:19pm, John Ogness wrote: > Hi Petr, > > On 2023-10-06, Petr Mladek <pmladek@xxxxxxxx> wrote: > >> During the demo at LPC2022 we had the situation that there was a large > >> backlog when a WARN was hit. With current mainline the first line of the > >> WARN is put into the ringbuffer and then the entire backlog is flushed > >> before storing the rest of the WARN into the ringbuffer. At the time it > >> was obvious that we should finish storing the WARN message and then > >> start flushing the backlog. > > > > This talks about the "emergency" context (WARN/OOPS/watchdog). > > The system might be in big troubles but it would still try to continue. > > > > Do we really want to defer the flush also for panic() context? > > We can start flushing right after the backtrace is in the > ringbuffer. But flushing the backlog _before_ putting the backtrace into > the ringbuffer was not desired because if there is a large backlog, the > machine may not survive to get the backtrace out. And in that case it > won't even be in the ringbuffer to be used by other debugging > tools. > > > I ask because I was not on LPC 2022 in person and I do not remember > > all details. > > The LPC2022 demo/talk was recorded: > > https://www.youtube.com/watch?v=TVhNcKQvzxI > > At 55:55 is where the situation occurred and triggered the conversation, > ultimately leading to this new feature. > > You may also want to reread my summary: > > https://lore.kernel.org/lkml/875yheqh6v.fsf@xxxxxxxxxxxxxxxxxxxxx > > as well as Linus' follow-up message: > > https://lore.kernel.org/lkml/CAHk-=wieXPMGEm7E=Sz2utzZdW1d=9hJBwGYAaAipxnMXr0Hvg@xxxxxxxxxxxxxx > > > But it is tricky in panic(), see 8th patch at > > https://lore.kernel.org/r/20230919230856.661435-9-john.ogness@xxxxxxxxxxxxx > > > > + nbcon_atomic_exit() is called only in one code path. > > Correct. When panic() is complete and the machine goes into its infinite > loop. This is also the point where it will attempt an unsafe flush, if > it could not get the messages out yet. > > > + nbcon_atomic_flush_all() is used in other paths. It looks like > > a "Whack a mole" game to me. > > Several different outputs occur during panic(). The flush is everywhere > where something significant has been put into the ringbuffer and now it > would be OK to flush it. > > > + messages are never emitted by printk kthread either because > > CPUs are stopped or the kthread is not allowed to get the lock > > Correct. > > > I see only one positive of the explicit flush. The consoles would > > not delay crash_exec() and the crash dump might be closer to > > the point where panic() was called. > > It's only about getting the critical messages into the ringbuffer before > flushing. And since various things can go wrong during the many actions > within panic(), it makes sense to flush in between those actions. > > > Otherwise I see only negatives => IMHO, we want to flush atomic > > consoles synchronously from printk() in panic(). > > > > Does anyone really want explicit flushes in panic()? > > So far you are the only one speaking against it. I expect as time goes > on it will get even more complex as it becomes tunable (also something > we talked about during the demo). Flush consoles in panic kexec case sounds not good, but I have no deep understanding about the atomic printk series, added kexec list and reviewers in cc. > > John > Thanks Dave _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec