On 2019-02-27, Petr Mladek <pmladek@xxxxxxxx> wrote: >>>> Implement a non-sleeping NMI-safe write_atomic console function in >>>> order to support emergency printk messages. >>> >>> OK, it would be safe when prb_lock() is the only lock taken >>> in the NMI handler. >> >> Which is the case. As I wrote to you already [0], NMI contexts are >> _never_ allowed to do things that rely on waiting forever for other >> CPUs. > > Who says _never_? I agree that it is not reasonable. But the > history shows that it happens. Right, which is why it would need to become policy. The emergency messages (aka write_atomic) introduce a new requirement to the kernel because this callback must be callable from any context. The console drivers must have some way of synchronizing. The CPU-reentrant spin lock is the only solution I am aware of. > In principle, there is nothing wrong in using spinlock in NMI > when it is used only in NMI. The CPU-reentrant spin lock _will_ be used in NMI context and potentially could be used from any line of NMI code (if, for example, a panic is triggered). The problem is when you have 2 different spin locks in NMI context and their ordering cannot be guaranteed. And since I am introducing an implicit spin lock that potentially could be locked from any line of code, any explicit use of a spin lock in NMI could would really be adding a 2nd spin lock and thus deadlock potential. If the ringbuffer was fully lockless, we should be able to have per-console CPU-reentrant spin locks as long as the ordering is preserved, which I expect shouldn't be a problem. If any NMI context needed a spin lock for its own purposes, it would need to use the CPU-reentrant spin lock of the first console so as to preserve the ordering in case of a panic. >>> 2. I am afraid that we need to add some locking between CPUs >>> to avoid mixing characters from directly printed messages. >> >> That is exactly what console_atomic_lock() (actually prb_lock) is! > > Sure. But it should not be a common lock for the ring buffer and > all consoles. As long as the ring buffer requires a CPU-reentrant spin lock, I expect that it _must_ be a common lock for all. Consider the situation that the ring buffer writer code causes a panic. I think it is beneficial if at least 1 level of printk recursion is supported so that even these backtraces make it out on the emergency consoles. If the ring buffer becomes fully lockless, then we could move to per-console CPU-reentrant spin locks. John Ogness