On Mon 2024-03-11 18:14:19, John Ogness wrote: > Hi Petr, > > Thanks for the detailed feedback. Here is a lengthy response. I hope it > clarifies the uart port and console fields. And I think you identified a > bug relating to the setup() callback. > > On 2024-02-23, Petr Mladek <pmladek@xxxxxxxx> wrote: > > My main (only) concern was the synchronization of the various accessed > > variables, especially, port->cons. > > The only issue is if port->cons disappears between lock and unlock. I > see there is code setting port->cons to NULL, although I do not know > why. Once port->cons is set, there is never a reason to unset it. I wonder if it might be needed for hotplugging of the device or the driver. Well, I would expect that the structures won't exist when the driver is not loaded and/or the device providing the port does not exist. But maybe, it is just a defensive programming style where unused pointers are cleared. > Regardless, I will add port->lock synchronization when modifying > port->cons. There are only a few code sites and they are all during > driver setup. > > > Note: I am not completely sure how the early and valid console drivers > > share the same struct uart_port. So, maybe I miss some important > > guarantee. > > The struct uart_port is _not_ shared between the early and normal > consoles. However, the struct console is shared for normal consoles > amoung various ports of a particular driver. I see. > > Anyway. synchronization of port->cons looks like a shade area. > > IMHO, the existing code expects that it is used _only when the console > > is registered_. But this patch wants to access it _even before > > the console is registered_! > > Indeed. It is not enough for uart_is_nbcon() to check if it is an > NBCON. It must also check if it is registered, locklessly: > > hlist_unhashed_lockless(&con->node); > > Most importantly to be sure that nbcon_init() has already been called. > register_console() clears the nbcon state after cons->index has been > set, but before the console has been added to the list. Makes sense. Well, it brings another question. Does this allow to have the following situation? CPU0 CPU1 some_function() uart_port_lock() // locked just with up->lock // doing something with the port register_console() // add struct console using the same // port as CPU0 printk() console_try_lock() console_unlock() console_flush_all() // acquire context for the newly // registered nbcon nbcon_context_try_acquire(ctxt) con->write() BANG: Both CPU0 and CPU1 are writing to the same port. Reason: CPU0 locked only via port->lock. CPU1 locked only by acquiring nbcon context. Maybe, this is not possible because the console is registered when the struct uart_port is being initialized and nobody could use the same port in parallel, except for the early console. Where the early console is serialized using the console_lock(). Hmm, if the parallel use of struct port_lock is not possible during register_console then we probably do not even need to serialize setting and clearing of port->cons. By other words, I wonder if printk() is the only nasty user of the uart ports. By "nasty user" I mean: + Using the same port even without struct uart_port by early console. + Using the port via struct uart_port even before the device is completely initialized. I assume that register_console() is called from the driver init call. For example, the device/port gets visible in sysfs. I wonder if anyone could trigger an operation on the port which it is being registered as a console. Sigh, if we agree that the above race is possible then I can't think of any elegant solution. Well, it might be better to be on the safe side: One solution would be to add nbcon consoles into the console_list under uart_port_lock(). Another solution would be to make sure that any code serialized by uart_port_lock() will be already synchronized by nbcon context while the nbcon is added into the console_list. Maybe, we could do this in con->setup() callback. Something like: * @need_nbcon_context: true when uart_port_lock() has to acquire nbcon context as well struct uart_port { bool need_nbcon_context; int uart_console_setup(struct console *con, char *options) { struct uart_8250_port *up = &serial8250_ports[co->index]; spin_lock_irq(&up->lock); up->need_nbcon_context=true; spin_unlock_irq(&up->lock); // do whatever the original uart_console_setup did } int uart_console_exit(struct console *con, char *options) { struct uart_8250_port *up = &serial8250_ports[co->index]; // do whatever the original uart_console_exit did spin_lock_irq(&up->lock); up->need_nbcon_context=false; spin_unlock_irq(&up->lock); } We might even use this variable in uart_nbcon_acquire() to decide whether we need to acquire the context or not. > > For example, it took me quite a lot of time to shake my head around: > > > > #define uart_console(port) \ > > ((port)->cons && (port)->cons->index == (port)->line) > > > > + port->cons and port->line are updated in the uart code. > > It seems that the update is not serialized by port->lock. > > Something might be done under port->mutex. > > > > + cons->index is updated in register_console() under > > console_list_lock. > > > > Spoiler: I propose a solution which does not use uart_console(). > > This check is necessary because multiple ports of a driver will set and > share the same port->cons value, even if they are not really the > console. I looked into enforcing that port->cons is NULL if it is not a > registered console, but this is tricky. port->cons is driver-internal > and hidden from printk. The driver will set port->cons in case it > becomes the console and printk will set cons->index once it has chosen > which port will be the actual console. But there is no way to unset > port->cons if a port was not chosen by printk. > > The various fields have the following meaning (AFAICT): > > port->line: An identifier to represent a particular port supported by a > driver. > > port->cons: The struct console to use if this port is chosen to be a > console. > > port->console: Boolean, true if this port was chosen to be a > console. (Used only by the tty layer.) > > cons->index: The port chosen by printk to be a console. > > None of those fields specify if the port is currently registered as a > console. For that you would need to check if port->cons->node is hashed > and then verify that port->line matches port->cons->index. This is what > uart_nbcon_acquire() needs to do (as well as check if it is an nbcon > console). This is a great description. It would be great to have it somewhere in the sources. Maybe, above the locking/acquire functions. > >> --- a/drivers/tty/serial/8250/8250_port.c > >> +++ b/drivers/tty/serial/8250/8250_port.c > >> @@ -3284,6 +3284,7 @@ void serial8250_init_port(struct uart_8250_port *up) > >> struct uart_port *port = &up->port; > >> > >> spin_lock_init(&port->lock); > >> + port->nbcon_locked_port = false; > > > > I am not sure if the variable really helps anything: > > > > + nbcon_context release() must handle the case when it > > lost the ownership anyway. > > uart_nbcon_release() must first check if the provided port is a > registered nbcon console. A simple boolean check is certainly faster > than the 4 checks mentioned above. After all, if it was never locked, > there is no reason to unlock it. Fair enough. > > We just need to make sure that port->cons can be cleared only > > under port->lock. But this would be required even with > > port->nbcon_locked_port. > > Agreed. I will add this. > > >> --- a/kernel/printk/nbcon.c > >> +++ b/kernel/printk/nbcon.c > >> +void uart_nbcon_acquire(struct uart_port *up) > >> +{ > >> + struct console *con = up->cons; > >> + struct nbcon_context ctxt; > > > > I would add: > > > > lockdep_assert_held(&up->lock); > > OK. > > >> + > >> + if (!uart_is_nbcon(up)) > >> + return; > >> + > >> + WARN_ON_ONCE(up->nbcon_locked_port); > >> + > >> + do { > >> + do { > >> + memset(&ctxt, 0, sizeof(ctxt)); > >> + ctxt.console = con; > >> + ctxt.prio = NBCON_PRIO_NORMAL; > >> + } while (!nbcon_context_try_acquire(&ctxt)); > >> + > >> + } while (!nbcon_context_enter_unsafe(&ctxt)); > >> + > >> + up->nbcon_locked_port = true; > >> +} > >> +EXPORT_SYMBOL_GPL(uart_nbcon_acquire); > > > > I would prefer to split the uart and nbcon specific code, for example: > > Can you explain why? This code will not be used anywhere else. It would have helped me with the review. The function is short but it touches internals from both uart_port and nbcon words: + Implements new variant of nbcon_ctxt_acquire() API (busy loop, no timeout) + Checks and modifies struct uart_port details which affect uart_port_lock() API. IMHO, there is too much to think about in a single function ;-) Best Regards, Petr