Re: [Regression] "serial: 8250: use THRE & __stop_tx also with DMA" breaks DMA rx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 14 Mar 2023, Hans de Goede wrote:
> On 3/14/23 17:55, Ilpo Järvinen wrote:
> > On Tue, 14 Mar 2023, Hans de Goede wrote:
> >> On 3/14/23 12:48, Ilpo Järvinen wrote:
> >>> On Tue, 14 Mar 2023, Hans de Goede wrote:
> >>>> On 3/14/23 11:55, Ilpo Järvinen wrote:
> >>>>> On Tue, 14 Mar 2023, Hans de Goede wrote:
> >>>>>
> >>>>>> I have spend the last couple of days debugging a problem with Bluetooth
> >>>>>> adapters (HCIs) connected over an UART connection on Intel Bay Trail
> >>>>>> and Cherry Trail devices.
> >>>>>>
> >>>>>> After much debugging I found out that sometimes the first byte of
> >>>>>> a received packet (data[0]) would be overwritten with a 0 byte.
> >>>>>>
> >>>>>> E.g. this packet received during init of a BCM4324B3 (1) Bluetooth HCI:
> >>>>>>
> >>>>>> 04 0e 0a 01 79 fc 00 54 fe ff ff 00 00
> >>>>>>
> >>>>>> would sometimes turn into:
> >>>>>>
> >>>>>> 00 0e 0a 01 79 fc 00 54 fe ff ff 00 00
> >>>>>>
> >>>>>> Further investigation revealed that this goes away if I stop
> >>>>>> the dw_dmac module from loading, leading to:
> >>>>>> "dw-apb-uart 80860F0A:00: failed to request DMA"
> >>>>>> and the UART working without DMA support.
> >>>>>>
> >>>>>> Testing various kernels showed that this problem was introduced
> >>>>>> in v5.19, v5.18 - v5.18.19 are fine. An a git bisect points to:
> >>>>>>
> >>>>>> e8ffbb71f783 ("serial: 8250: use THRE & __stop_tx also with DMA")
> >>>>>>
> >>>>>> And reverting that on top of v6.3-rc2 indeed solves the problem.
> >>>>>
> >>>>>> So it seems that that commit somehow interferes with DMA based
> >>>>
> >>>>> Maybe the the extra interrupt that the tx side change will trigger somehow 
> >>>>> causes the confusion on the rx side. So that would be an extra call into 
> >>>>> handle_rx_dma() that could either do an extra flush or start DMA Rx that 
> >>>>> would not occur w/o that tx side change.
> >>>>
> >>>> That sounds like a likely candidate for causing this yes, as said
> >>>> I'm unfamiliar with the serial-port code, but I did already suspect
> >>>> that the change was causing some extra interrupt which somehow
> >>>> interfered with the rx side.
> >>>>
> >>>>>> The issue has been seen on and the revert has been tested on
> >>>>>> the following HW:
> >>>>>>
> >>>>>> Asus T100TA
> >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
> >>>>>>
> >>>>>> Lenovo Yoga Tablet 2 1051L
> >>>>>> SoC: Bay Trail UART: 80860F0A WIFI: brcmfmac43241b4-sdio BT: BCM4324B3
> >>>>>>
> >>>>>> Lenovo Yoga Book X91F
> >>>>>> Soc: Cherry Trail UART: 8086228A WIFI: brcmfmac4356-pcie BT: BCM4356A2
> >>>>>>
> >>>>>> I have more hw which I believe is affected but these are the models
> >>>>>> where I have done detailed testing.
> >>>>>>
> >>>>>> I would be happy to test any patches, or run a kernel with some extra
> >>>>>> debugging added, just let me know what you need to help fixing this.

> > From: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> > Subject: [PATCH] serial: 8250: Prevent starting up DMA Rx on THRI interrupt
> > 
> > Hans de Goede reported Bluetooth adapters (HCIs) connected over an UART
> > connection failed due corrupted Rx payload. The problem was narrowed
> > down to DMA Rx starting on UART_IIR_THRI interrupt. The problem occurs 
> > despite LSR having DR bit set, which is precondition for attempting to 
> > start DMA Rx.
> > 
> > This problem became apparent after e8ffbb71f783 ("serial: 8250: use
> > THRE & __stop_tx also with DMA") changed Tx stopping to get triggered
> > using THRI interrupt.
> > 
> > Don't setup DMA Rx on UART_IIR_THRI but leave it to a subsequent
> > interrupt which has Rx related IIR value.
> > 
> > Reported-by: Hans de Goede <hdegoede@xxxxxxxxxx>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
> 
> I can confirm that this fixes things for me:
> 
> Tested-by: Hans de Goede <hdegoede@xxxxxxxxxx>

Okay, thanks for testing.

Here's is a slightly improved debug patch which will count how many 
characters are received by DMA and non-DMA rx. It should be tested 
WITHOUT the fix.

I'm mostly interested in knowing if it's purely DMA Rx issue or whether 
the non-DMA rx is muddling the waters too. While investigating other 
issues I've seen UART_IIR_TIMEOUT (inter-character timeout) to often come 
so early from UART that the tail of characters is left there to be handled 
by the non-DMA path.


From: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>
Subject: [PATCH] DEBUG: Dma rx-problem

A DEBUG PATCH, not intended for upstream inclusing.

---
 drivers/tty/serial/8250/8250_dma.c  | 17 ++++++++++++-----
 drivers/tty/serial/8250/8250_port.c | 19 +++++++++++++++++++
 include/linux/serial_8250.h         |  3 +++
 3 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index 7fa66501792d..705c26941429 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -32,13 +32,15 @@ static void __dma_tx_complete(void *param)
 		uart_write_wakeup(&p->port);
 
 	ret = serial8250_tx_dma(p);
-	if (ret || !dma->tx_running)
+	if (ret || !dma->tx_running) {
+		p->irq_debug = -5;
 		serial8250_set_THRI(p);
+	}
 
 	spin_unlock_irqrestore(&p->port.lock, flags);
 }
 
-static void __dma_rx_complete(struct uart_8250_port *p)
+static int __dma_rx_complete(struct uart_8250_port *p)
 {
 	struct uart_8250_dma	*dma = p->dma;
 	struct tty_port		*tty_port = &p->port.state->port;
@@ -53,7 +55,7 @@ static void __dma_rx_complete(struct uart_8250_port *p)
 	 */
 	dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
 	if (dma_status == DMA_IN_PROGRESS)
-		return;
+		return -1;
 
 	count = dma->rx_size - state.residue;
 
@@ -62,6 +64,8 @@ static void __dma_rx_complete(struct uart_8250_port *p)
 	dma->rx_running = 0;
 
 	tty_flip_buffer_push(tty_port);
+
+	return count;
 }
 
 static void dma_rx_complete(void *param)
@@ -69,10 +73,11 @@ static void dma_rx_complete(void *param)
 	struct uart_8250_port *p = param;
 	struct uart_8250_dma *dma = p->dma;
 	unsigned long flags;
+	int count = -1;
 
 	spin_lock_irqsave(&p->port.lock, flags);
 	if (dma->rx_running)
-		__dma_rx_complete(p);
+		count = __dma_rx_complete(p);
 
 	/*
 	 * Cannot be combined with the previous check because __dma_rx_complete()
@@ -81,6 +86,8 @@ static void dma_rx_complete(void *param)
 	if (!dma->rx_running && (serial_lsr_in(p) & UART_LSR_DR))
 		p->dma->rx_dma(p);
 	spin_unlock_irqrestore(&p->port.lock, flags);
+
+	pr_err("8250dma: rx complete %d\n", count);
 }
 
 int serial8250_tx_dma(struct uart_8250_port *p)
@@ -172,7 +179,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p)
 
 	if (dma->rx_running) {
 		dmaengine_pause(dma->rxchan);
-		__dma_rx_complete(p);
+		p->dma_rx_count = __dma_rx_complete(p);
 		dmaengine_terminate_async(dma->rxchan);
 	}
 }
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index fa43df05342b..0c950ff7a8ee 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -1811,6 +1811,7 @@ u16 serial8250_rx_chars(struct uart_8250_port *up, u16 lsr)
 	} while (lsr & (UART_LSR_DR | UART_LSR_BI));
 
 	tty_flip_buffer_push(&port->state->port);
+	up->rx_chars_count = 256 - max_count;
 	return lsr;
 }
 EXPORT_SYMBOL_GPL(serial8250_rx_chars);
@@ -1923,6 +1924,10 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 	struct uart_8250_port *up = up_to_u8250p(port);
 	bool skip_rx = false;
 	unsigned long flags;
+	long irq_debug;
+	unsigned int tx_running = 0, rx_running = 0, tx_err = 0;
+	unsigned int ier;
+	int rx_count, dma_count;
 	u16 status;
 
 	if (iir & UART_IIR_NO_INT)
@@ -1931,6 +1936,14 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 	spin_lock_irqsave(&port->lock, flags);
 
 	status = serial_lsr_in(up);
+	up->dma_rx_count = 0;
+	up->rx_chars_count = 0;
+	if (up->dma) {
+		rx_running = up->dma->rx_running;
+		tx_running = up->dma->tx_running;
+		tx_err = up->dma->tx_err;
+	}
+	ier = up->ier;
 
 	/*
 	 * If port is stopped and there are no error conditions in the
@@ -1957,7 +1970,13 @@ int serial8250_handle_irq(struct uart_port *port, unsigned int iir)
 			__stop_tx(up);
 	}
 
+	irq_debug = up->irq_debug++;
+	rx_count = up->rx_chars_count;
+	dma_count = up->dma_rx_count;
 	uart_unlock_and_check_sysrq_irqrestore(port, flags);
+	if (irq_debug < 0)
+		pr_err("8250irq: iir=%02x lsr+saved=%02x received=%d/%d ier=%02x dma_t/rx/err=%u/%u/%u cnt=%ld\n",
+		       iir, status, dma_count, rx_count, ier, tx_running, rx_running, tx_err, irq_debug);
 
 	return 1;
 }
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index 741ed4807a9c..4a0c0d549d7b 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -107,6 +107,9 @@ struct uart_8250_port {
 	unsigned char		mcr;
 	unsigned char		cur_iotype;	/* Running I/O type */
 	unsigned int		rpm_tx_active;
+	long			irq_debug;
+	int			rx_chars_count;
+	int			dma_rx_count;
 	unsigned char		canary;		/* non-zero during system sleep
 						 *   if no_console_suspend
 						 */

-- 
tg: (571079f5ba93..) debug/dma-rx-problem (depends on: tty-next)

[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux