On Fri, Sep 06 2024 at 18:44, John Ogness wrote: > So there are 2 things _not_ supported by the write_atomic() callback: > > 1. RS485 mode. This is due to the need to start up TX for the > write, which can lead to: > > up->rs485_start_tx() > serial8250_em485_start_tx() > serial8250_stop_rx() > serial8250_rpm_get() > pm_runtime_get_sync() > __pm_runtime_resume() > spin_lock_irqsave() > > Taking a spin lock is not safe from NMI and thus disqualifies this call > chain for write_atomic(). Correct. __pm_runtime_resume() can sleep as well :) > If UART_CAP_RPM is not set, the pm_runtime_get_sync() is avoided. So I > could only disable atomic RS485 if UART_CAP_RPM is set. But the OMAP > variant of the 8250 does set this capability. Sure, but none of this makes sense. What's so special about that em485 muck that serial8250_stop_rx() needs to do that PM dance? It writes the IER register, which serial8250_console_write() just wrote to in serial8250_clear_IER() without doing this PM dance. So for the console write path this stop part is not required at all. That said, serial8250_em485_stop_tx() doesn't have this PM dance either. I'm 100% that this is just a problem of blindly sharing this with the regular uart code and not because there is a requirement. See what serial8250_console_setup() does at the end: if (port->dev) pm_runtime_get_sync(port->dev); The corresponding put() is in serial8250_console_exit(). So there is absolutely zero reason for power management in the console write functions. It's the usual voodoo programming which nobody noticed because it did not immediately blow up in their face. There is another minor issue in that em485 muck. One code path arms a hrtimer, which does not work from NMI like contexts, but that is only taken when the transmitter is not empty, so probably a non-issue because the console write code waits for it to be drained. There are also a few lockdep_assert_held_once(port->lock) in that code which will trigger when called from the nbcon write functions. They are already broken today when oops_in_progress is set and the trylock of port::lock fails... So splitting this up into a clean and lean set for the console write functions will make all these horrors just go away. The current sharing is just fragile as hell and makes no sense at all. > 2. Modem control. This is due to waiting for inputs, which can lead to: > > serial8250_modem_status() > wake_up_interruptible() > > Performing wakes is not safe from scheduler or NMI and thus disqualifies > this call chain for write_atomic(). > > It would probably be acceptable to move serial8250_modem_status() into > an irq_work. Yes, but serial8250_modem_status() has more problems than that: See uart_handle_dcd_change() and uart_handle_cts_change(). They call into the tty layer and do their own wakeups. So no, serial8250_modem_status() cannot be invoked there at all. You have to defer this whole status dance to irq work and this really needs to be done inside the write_atomic() callback. Otherwise a status change could get lost, which is bad in non-panic situations. That needs a bit of thought vs. port->msr_saved_flags, because in a hostile takeover situation that needs to take into account that the interrupted context might be fiddling with msr_saved_flags too, which might on resume overwrite the write_atomic() modifications due to RMW. Shouldn't be hard. That brings me to that USE_SERIAL_8250_LEGACY_CONSOLE #ifdeffery, which started this conversation. The nbcon conversion does not make things worse than they are today. Any problem which happens in the atomic_write() callback has existed before already. So just get rid of the legacy code and be done with it. At some point you have to bite the bullet and deal with the fallout when it's reported. Remember, perfect is the enemy of good and you will never reach perfect. Thanks, tglx