On (10/04/18 10:36), Petr Mladek wrote: > > This looks like a reasonable explanation of what is happening here. > It also explains why the console owner logic helped. Well, I'm still a bit puzzled, frankly speaking. I've two theories. Theory #1 [most likely] Steven is a wizard and his code cures whatever problem we throw it at. Theory #2 console_sem hand over actually spreads print out, so we don't have one CPU doing all the printing job. Instead every CPU prints its backtrace, while the CPU which issued all_cpus_backtrace() waits for them. So all_cpus_backtrace() still has to wait for NR_CPUS * strlen(bakctrace), which still probably truggers NMI panic on it at some point. The panic CPU send out stop IPI, then it waits for foreign CPUs to ACK stop IPI request - for 10 seconds. So each CPU prints its backtrace, then ACK stop IPI. So when panic CPU proceeds with flush_on_panic() and emergency_reboot() uart_port->lock is unlocked. Without the patch we probably declare NMI panic on the CPU which does all the printing work, and panic sometimes jumps in when that CPU is in busy in serial8250_console_write(), holding the uart_port->lock. So we can't re-enter the 8250 driver from panic CPU and we can't reboot the system. In other words... Steven is a wizard. > > serial8250_console_write() > > { > > if (port->sysrq) > > locked = 0; > > else if (oops_in_progress) > > locked = spin_trylock_irqsave(&port->lock, flags); > > else > > spin_lock_irqsave(&port->lock, flags); > > > > ... > > uart_console_write(port, s, count, serial8250_console_putchar); > > ... > > > > if (locked) > > spin_unlock_irqrestore(&port->lock, flags); > > } > > > > Now... the problem. A theory, in fact. > > panic() sets oops_in_progress back to zero - bust_spinlocks(0) - too soon. > > I see your point. I am just a bit scared of this way. Ignoring locks > is a dangerous and painful approach in general. Well, I agree. But 8250 is not the only console which does ignore uart_port lock state sometimes. Otherwise sysrq would be totally unreliable, including emergency reboot. So it's sort of how it has been for quite some time, I guess. We are in panic(), it's over, so we probably can ignore uart_port->lock at this point. -ss