On Tue, 14 Mar 2023, Hans de Goede wrote: > On 3/14/23 17:55, Ilpo Järvinen wrote: > > On Tue, 14 Mar 2023, Hans de Goede wrote: > >> On 3/14/23 12:48, Ilpo Järvinen wrote: > >>> On Tue, 14 Mar 2023, Hans de Goede wrote: > >>>> On 3/14/23 11:55, Ilpo Järvinen wrote: > >>>>> On Tue, 14 Mar 2023, Hans de Goede wrote: > >>>>> > >>>>>> I have spend the last couple of days debugging a problem with Bluetooth > >>>>>> adapters (HCIs) connected over an UART connection on Intel Bay Trail > >>>>>> and Cherry Trail devices. > >>>>>> > >>>>>> After much debugging I found out that sometimes the first byte of > >>>>>> a received packet (data[0]) would be overwritten with a 0 byte. > >>>>>> > >>>>>> E.g. this packet received during init of a BCM4324B3 (1) Bluetooth HCI: > >>>>>> > >>>>>> 04 0e 0a 01 79 fc 00 54 fe ff ff 00 00 > >>>>>> > >>>>>> would sometimes turn into: > >>>>>> > >>>>>> 00 0e 0a 01 79 fc 00 54 fe ff ff 00 00 > >>>>>> > >>>>>> Further investigation revealed that this goes away if I stop > >>>>>> the dw_dmac module from loading, leading to: > >>>>>> "dw-apb-uart 80860F0A:00: failed to request DMA" > >>>>>> and the UART working without DMA support. > >>>>>> > >>>>>> Testing various kernels showed that this problem was introduced > >>>>>> in v5.19, v5.18 - v5.18.19 are fine. An a git bisect points to: > >>>>>> > >>>>>> e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA") > >>>>>> > >>>>>> And reverting that on top of v6.3-rc2 indeed solves the problem. > >>>>> > >>>>>> So it seems that that commit somehow interferes with DMA based > >>>> > >>>>> Maybe the the extra interrupt that the tx side change will trigger somehow > >>>>> causes the confusion on the rx side. So that would be an extra call into > >>>>> handle_rx_dma() that could either do an extra flush or start DMA Rx that > >>>>> would not occur w/o that tx side change. > >>>> > >>>> That sounds like a likely candidate for causing this yes, as said > >>>> I'm unfamiliar with the serial-port code, but I did already suspect > >>>> that the change was causing some extra interrupt which somehow > >>>> interfered with the rx side. > >>>> > >>>>>> The issue has been seen on and the revert has been tested on > >>>>>> the following HW: > >>>>>> > >>>>>> Asus T100TA > >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3 > >>>>>> > >>>>>> Lenovo Yoga Tablet 2 1051L > >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3 > >>>>>> > >>>>>> Lenovo Yoga Book X91F > >>>>>> Soc: Cherry Trail UART: 8086228A WIFI: brcmfmac4356-pcie BT: BCM4356A2 > >>>>>> > >>>>>> I have more hw which I believe is affected but these are the models > >>>>>> where I have done detailed testing. > >>>>>> > >>>>>> I would be happy to test any patches, or run a kernel with some extra > >>>>>> debugging added, just let me know what you need to help fixing this. > > From: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > > Subject: [PATCH] serial: 8250: Prevent starting up DMA Rx on THRI interrupt > > > > Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART > > connection failed due corrupted Rx payload. The problem was narrowed > > down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs > > despite LSR having DR bit set, which is precondition for attempting to > > start DMA Rx. > > > > This problem became apparent after e8ffbb71f783 ("serial: 8250: use > > THRE & __stop_tx also with DMA") changed Tx stopping to get triggered > > using THRI interrupt. > > > > Don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent > > interrupt which has Rx related IIR value. > > > > Reported-by: Hans de Goede <hdegoede@xxxxxxxxxx> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > > I can confirm that this fixes things for me: > > Tested-by: Hans de Goede <hdegoede@xxxxxxxxxx> Okay, thanks for testing. Here's is a slightly improved debug patch which will count how many characters are received by DMA and non-DMA rx. It should be tested WITHOUT the fix. I'm mostly interested in knowing if it's purely DMA Rx issue or whether the non-DMA rx is muddling the waters too. While investigating other issues I've seen UART_IIR_TIMEOUT (inter-character timeout) to often come so early from UART that the tail of characters is left there to be handled by the non-DMA path. From: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> Subject: [PATCH] DEBUG: Dma rx-problem A DEBUG PATCH, not intended for upstream inclusing. --- drivers/tty/serial/8250/8250_dma.c | 17 ++++++++++++----- drivers/tty/serial/8250/8250_port.c | 19 +++++++++++++++++++ include/linux/serial_8250.h | 3 +++ 3 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c index 7fa66501792d..705c26941429 100644 --- a/drivers/tty/serial/8250/8250_dma.c +++ b/drivers/tty/serial/8250/8250_dma.c @@ -32,13 +32,15 @@ static void __dma_tx_complete(void *param) uart_write_wakeup(&p->port); ret = serial8250_tx_dma(p); - if (ret || !dma->tx_running) + if (ret || !dma->tx_running) { + p->irq_debug = -5; serial8250_set_THRI(p); + } spin_unlock_irqrestore(&p->port.lock, flags); } -static void __dma_rx_complete(struct uart_8250_port *p) +static int __dma_rx_complete(struct uart_8250_port *p) { struct uart_8250_dma *dma = p->dma; struct tty_port *tty_port = &p->port.state->port; @@ -53,7 +55,7 @@ static void __dma_rx_complete(struct uart_8250_port *p) */ dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state); if (dma_status == DMA_IN_PROGRESS) - return; + return -1; count = dma->rx_size - state.residue; @@ -62,6 +64,8 @@ static void __dma_rx_complete(struct uart_8250_port *p) dma->rx_running = 0; tty_flip_buffer_push(tty_port); + + return count; } static void dma_rx_complete(void *param) @@ -69,10 +73,11 @@ static void dma_rx_complete(void *param) struct uart_8250_port *p = param; struct uart_8250_dma *dma = p->dma; unsigned long flags; + int count = -1; spin_lock_irqsave(&p->port.lock, flags); if (dma->rx_running) - __dma_rx_complete(p); + count = __dma_rx_complete(p); /* * Cannot be combined with the previous check because __dma_rx_complete() @@ -81,6 +86,8 @@ static void dma_rx_complete(void *param) if (!dma->rx_running && (serial_lsr_in(p) & UART_LSR_DR)) p->dma->rx_dma(p); spin_unlock_irqrestore(&p->port.lock, flags); + + pr_err("8250dma: rx complete %d\n", count); } int serial8250_tx_dma(struct uart_8250_port *p) @@ -172,7 +179,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p) if (dma->rx_running) { dmaengine_pause(dma->rxchan); - __dma_rx_complete(p); + p->dma_rx_count = __dma_rx_complete(p); dmaengine_terminate_async(dma->rxchan); } } diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c index fa43df05342b..0c950ff7a8ee 100644 --- a/drivers/tty/serial/8250/8250_port.c +++ b/drivers/tty/serial/8250/8250_port.c @@ -1811,6 +1811,7 @@ u16 serial8250_rx_chars(struct uart_8250_port *up, u16 lsr) } while (lsr & (UART_LSR_DR | UART_LSR_BI)); tty_flip_buffer_push(&port->state->port); + up->rx_chars_count = 256 - max_count; return lsr; } EXPORT_SYMBOL_GPL(serial8250_rx_chars); @@ -1923,6 +1924,10 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir) struct uart_8250_port *up = up_to_u8250p(port); bool skip_rx = false; unsigned long flags; + long irq_debug; + unsigned int tx_running = 0, rx_running = 0, tx_err = 0; + unsigned int ier; + int rx_count, dma_count; u16 status; if (iir & UART_IIR_NO_INT) @@ -1931,6 +1936,14 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir) spin_lock_irqsave(&port->lock, flags); status = serial_lsr_in(up); + up->dma_rx_count = 0; + up->rx_chars_count = 0; + if (up->dma) { + rx_running = up->dma->rx_running; + tx_running = up->dma->tx_running; + tx_err = up->dma->tx_err; + } + ier = up->ier; /* * If port is stopped and there are no error conditions in the @@ -1957,7 +1970,13 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir) __stop_tx(up); } + irq_debug = up->irq_debug++; + rx_count = up->rx_chars_count; + dma_count = up->dma_rx_count; uart_unlock_and_check_sysrq_irqrestore(port, flags); + if (irq_debug < 0) + pr_err("8250irq: iir=%02x lsr+saved=%02x received=%d/%d ier=%02x dma_t/rx/err=%u/%u/%u cnt=%ld\n", + iir, status, dma_count, rx_count, ier, tx_running, rx_running, tx_err, irq_debug); return 1; } diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index 741ed4807a9c..4a0c0d549d7b 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -107,6 +107,9 @@ struct uart_8250_port { unsigned char mcr; unsigned char cur_iotype; /* Running I/O type */ unsigned int rpm_tx_active; + long irq_debug; + int rx_chars_count; + int dma_rx_count; unsigned char canary; /* non-zero during system sleep * if no_console_suspend */ -- tg: (571079f5ba93..) debug/dma-rx-problem (depends on: tty-next)