On Tue, 4 Jun 2024, Douglas Anderson wrote: > On devices using Qualcomm's GENI UART it is possible to get the UART > stuck such that it no longer outputs data. Specifically, logging in > via an agetty on the debug serial port (which was _not_ used for > kernel console) and running: > cat /var/log/messages > ...and then (via an SSH session) forcing a few suspend/resume cycles > causes the UART to stop transmitting. > > The root of the problems was with qcom_geni_serial_stop_tx_fifo() > which is called as part of the suspend process. Specific problems with > that function: > - When an in-progress "tx" command is cancelled it doesn't appear to > fully drain the FIFO. That meant qcom_geni_serial_tx_empty() > continued to report that the FIFO wasn't empty. The > qcom_geni_serial_start_tx_fifo() function didn't re-enable > interrupts in this case so the driver would never start transferring > again. > - When the driver cancelled the current "tx" command but it forgot to > zero out "tx_remaining". This confused logic elsewhere in the > driver. > - From experimentation, it appears that cancelling the "tx" command > could drop some of the queued up bytes. > > While qcom_geni_serial_stop_tx_fifo() could be fixed to drain the FIFO > and shut things down properly, stop_tx() isn't supposed to be a slow > function. It is run with local interrupts off and is documented to > stop transmitting "as soon as possible". Change the function to just > stop new bytes from being queued. In order to make this work, change > qcom_geni_serial_start_tx_fifo() to remove some conditions. It's > always safe to enable the watermark interrupt and the IRQ handler will > disable it if it's not needed. > > For system suspend the queue still needs to be drained. Failure to do > so means that the hardware won't provide new interrupts until a > "cancel" command is sent. Add draining logic (fixing the issues noted > above) at suspend time. > > NOTE: It would be ideal if qcom_geni_serial_stop_tx_fifo() could > "pause" the transmitter right away. There is no obvious way to do this > in the docs and experimentation didn't find any tricks either, so > stopping TX "as soon as possible" isn't very soon but is the best > possible. > > Fixes: c4f528795d1a ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP") > Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx> > --- > There are still a number of problems with GENI UART after this but > I've kept this change separate to make it easier to understand. > Specifically on mainline just hitting "Ctrl-C" after dumping > /var/log/messages to the serial port hangs things after the kfifo > changes. Those issues will be addressed in future patches. > > It should also be noted that the "Fixes" tag here is a bit of a > swag. I haven't gone and tested on ancient code, but at least the > problems exist on kernel 5.15 and much of the code touched here has > been here since the beginning, or at least since as long as the driver > was stable. > > Changes in v3: > - 0xffffffff => GENMASK(31, 0) > - Reword commit message. > > Changes in v2: > - Totally rework / rename patch to handle suspend while active xfer > > drivers/tty/serial/qcom_geni_serial.c | 97 +++++++++++++++++++++------ > 1 file changed, 75 insertions(+), 22 deletions(-) > > diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c > index 4dbc59873b34..46b6674d90c5 100644 > --- a/drivers/tty/serial/qcom_geni_serial.c > +++ b/drivers/tty/serial/qcom_geni_serial.c > @@ -130,6 +130,7 @@ struct qcom_geni_serial_port { > bool brk; > > unsigned int tx_remaining; > + unsigned int tx_total; > int wakeup_irq; > bool rx_tx_swap; > bool cts_rts_swap; > @@ -311,11 +312,14 @@ static bool qcom_geni_serial_poll_bit(struct uart_port *uport, > > static void qcom_geni_serial_setup_tx(struct uart_port *uport, u32 xmit_size) > { > + struct qcom_geni_serial_port *port = to_dev_port(uport); > u32 m_cmd; > > writel(xmit_size, uport->membase + SE_UART_TX_TRANS_LEN); > m_cmd = UART_START_TX << M_OPCODE_SHFT; Unrelated to this patch and won't belong to this patch but I noticed it while reviewing. This could be converted into: m_cmd = FIELD_PREP(M_OPCODE_MSK, UART_START_TX); (and after converting the other use in the header file, the SHFT define becomes unused). > writel(m_cmd, uport->membase + SE_GENI_M_CMD0); > + > + port->tx_total = xmit_size; > } > > static void qcom_geni_serial_poll_tx_done(struct uart_port *uport) > @@ -335,6 +339,64 @@ static void qcom_geni_serial_poll_tx_done(struct uart_port *uport) > writel(irq_clear, uport->membase + SE_GENI_M_IRQ_CLEAR); > } > > +static void qcom_geni_serial_drain_tx_fifo(struct uart_port *uport) > +{ > + struct qcom_geni_serial_port *port = to_dev_port(uport); > + > + /* > + * If the main sequencer is inactive it means that the TX command has > + * been completed and all bytes have been sent. Nothing to do in that > + * case. > + */ > + if (!qcom_geni_serial_main_active(uport)) > + return; > + > + /* > + * Wait until the FIFO has been drained. We've already taken bytes out > + * of the higher level queue in qcom_geni_serial_send_chunk_fifo() so > + * if we don't drain the FIFO but send the "cancel" below they seem to > + * get lost. > + */ > + qcom_geni_serial_poll_bitfield(uport, SE_GENI_M_GP_LENGTH, GENMASK(31, 0), That GENMASK(31, 0) is a field in a register (even if it covers the entire register)? It should be named with a define instead of creating the field mask here in an online fashion. > + port->tx_total - port->tx_remaining); > + > + /* > + * If clearing the FIFO made us inactive then we're done--no need for > + * a cancel. > + */ > + if (!qcom_geni_serial_main_active(uport)) > + return; > + > + /* > + * Cancel the current command. After this the main sequencer will > + * stop reporting that it's active and we'll have to start a new > + * transfer command. > + * > + * If we skip doing this cancel and then continue with a system > + * suspend while there's an active command in the main sequencer > + * then after resume time we won't get any more interrupts on the > + * main sequencer until we send the cancel. > + */ > + geni_se_cancel_m_cmd(&port->se); > + if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS, > + M_CMD_CANCEL_EN, true)) { > + /* The cancel failed; try an abort as a fallback. */ > + geni_se_abort_m_cmd(&port->se); > + qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS, > + M_CMD_ABORT_EN, true); Misaligned. -- i. > + writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR); > + } > + writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR); > + > + /* > + * We've cancelled the current command. "tx_remaining" stores how > + * many bytes are left to finish in the current command so we know > + * when to start a new command. Since the command was cancelled we > + * need to zero "tx_remaining". > + */ > + port->tx_remaining = 0; > +} > + > static void qcom_geni_serial_abort_rx(struct uart_port *uport) > { > u32 irq_clear = S_CMD_DONE_EN | S_CMD_ABORT_EN; > @@ -655,37 +717,18 @@ static void qcom_geni_serial_start_tx_fifo(struct uart_port *uport) > { > u32 irq_en; > > - if (qcom_geni_serial_main_active(uport) || > - !qcom_geni_serial_tx_empty(uport)) > - return; > - > irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN); > irq_en |= M_TX_FIFO_WATERMARK_EN | M_CMD_DONE_EN; > - > writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN); > } > > static void qcom_geni_serial_stop_tx_fifo(struct uart_port *uport) > { > u32 irq_en; > - struct qcom_geni_serial_port *port = to_dev_port(uport); > > irq_en = readl(uport->membase + SE_GENI_M_IRQ_EN); > irq_en &= ~(M_CMD_DONE_EN | M_TX_FIFO_WATERMARK_EN); > writel(irq_en, uport->membase + SE_GENI_M_IRQ_EN); > - /* Possible stop tx is called multiple times. */ > - if (!qcom_geni_serial_main_active(uport)) > - return; > - > - geni_se_cancel_m_cmd(&port->se); > - if (!qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS, > - M_CMD_CANCEL_EN, true)) { > - geni_se_abort_m_cmd(&port->se); > - qcom_geni_serial_poll_bit(uport, SE_GENI_M_IRQ_STATUS, > - M_CMD_ABORT_EN, true); > - writel(M_CMD_ABORT_EN, uport->membase + SE_GENI_M_IRQ_CLEAR); > - } > - writel(M_CMD_CANCEL_EN, uport->membase + SE_GENI_M_IRQ_CLEAR); > } > > static void qcom_geni_serial_handle_rx_fifo(struct uart_port *uport, bool drop) > @@ -1067,7 +1110,15 @@ static int setup_fifos(struct qcom_geni_serial_port *port) > } > > > -static void qcom_geni_serial_shutdown(struct uart_port *uport) > +static void qcom_geni_serial_shutdown_dma(struct uart_port *uport) > +{ > + disable_irq(uport->irq); > + > + qcom_geni_serial_stop_tx(uport); > + qcom_geni_serial_stop_rx(uport); > +} > + > +static void qcom_geni_serial_shutdown_fifo(struct uart_port *uport) > { > disable_irq(uport->irq); > > @@ -1076,6 +1127,8 @@ static void qcom_geni_serial_shutdown(struct uart_port *uport) > > qcom_geni_serial_stop_tx(uport); > qcom_geni_serial_stop_rx(uport); > + > + qcom_geni_serial_drain_tx_fifo(uport); > } > > static int qcom_geni_serial_port_setup(struct uart_port *uport) > @@ -1533,7 +1586,7 @@ static const struct uart_ops qcom_geni_console_pops = { > .startup = qcom_geni_serial_startup, > .request_port = qcom_geni_serial_request_port, > .config_port = qcom_geni_serial_config_port, > - .shutdown = qcom_geni_serial_shutdown, > + .shutdown = qcom_geni_serial_shutdown_fifo, > .type = qcom_geni_serial_get_type, > .set_mctrl = qcom_geni_serial_set_mctrl, > .get_mctrl = qcom_geni_serial_get_mctrl, > @@ -1555,7 +1608,7 @@ static const struct uart_ops qcom_geni_uart_pops = { > .startup = qcom_geni_serial_startup, > .request_port = qcom_geni_serial_request_port, > .config_port = qcom_geni_serial_config_port, > - .shutdown = qcom_geni_serial_shutdown, > + .shutdown = qcom_geni_serial_shutdown_dma, > .type = qcom_geni_serial_get_type, > .set_mctrl = qcom_geni_serial_set_mctrl, > .get_mctrl = qcom_geni_serial_get_mctrl, >