Re: [PATCH v4 7/8] serial: qcom-geni: Fix suspend while active UART xfer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 10, 2024 at 03:24:25PM -0700, Douglas Anderson wrote:
> On devices using Qualcomm's GENI UART it is possible to get the UART
> stuck such that it no longer outputs data. Specifically, logging in
> via an agetty on the debug serial port (which was _not_ used for
> kernel console) and running:
>   cat /var/log/messages
> ...and then (via an SSH session) forcing a few suspend/resume cycles
> causes the UART to stop transmitting.

An easier way to trigger this old bug is to just run a command like
dmesg and hit ctrl-s in a serial console to stop tx. Interrupting the
command or hitting ctrl-q to restart tx then triggers the soft lockup.

> The root of the problems was with qcom_geni_serial_stop_tx_fifo()
> which is called as part of the suspend process. Specific problems with
> that function:
> - When an in-progress "tx" command is cancelled it doesn't appear to
>   fully drain the FIFO. That meant qcom_geni_serial_tx_empty()
>   continued to report that the FIFO wasn't empty. The
>   qcom_geni_serial_start_tx_fifo() function didn't re-enable
>   interrupts in this case so the driver would never start transferring
>   again.
> - When the driver cancelled the current "tx" command but it forgot to
>   zero out "tx_remaining". This confused logic elsewhere in the
>   driver.
> - From experimentation, it appears that cancelling the "tx" command
>   could drop some of the queued up bytes.
> 
> While qcom_geni_serial_stop_tx_fifo() could be fixed to drain the FIFO
> and shut things down properly, stop_tx() isn't supposed to be a slow
> function. It is run with local interrupts off and is documented to
> stop transmitting "as soon as possible". Change the function to just
> stop new bytes from being queued. In order to make this work, change
> qcom_geni_serial_start_tx_fifo() to remove some conditions. It's
> always safe to enable the watermark interrupt and the IRQ handler will
> disable it if it's not needed.
> 
> For system suspend the queue still needs to be drained. Failure to do
> so means that the hardware won't provide new interrupts until a
> "cancel" command is sent. Add draining logic (fixing the issues noted
> above) at suspend time.

So I spent the better part of the weekend looking at this driver and
this is one of the bits I worry about with your approach as relying on
draining anything won't work with hardware flow control.

Cancelling commands can result stalled TX in a number of ways and
there's still at least one that you don't handle. If you end up with
data in in the FIFO, the watermark interrupt may never fire when you try
to restart tx.

I'm leaning towards fixing the immediate hard lockup regression
separately and then we can address the older bugs and rework driver
without having to rush things.

I've prepared a minimal three patch series which fixes most of the
discussed issues (hard and soft lockup and garbage characters) and that
should be backportable as well.

Currently, the diffstat is just:

	 drivers/tty/serial/qcom_geni_serial.c | 36 +++++++++++++++++++++++++-----------
	 1 file changed, 25 insertions(+), 11 deletions(-)

Fixing the hard lockup 6.10-rc1 regression is just a single line.

Johan




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux