Re: Possible regression in 8250_dw driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 24 May 2023, Richard Tresidder wrote:
> On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:
> > On Tue, 23 May 2023, Richard Tresidder wrote:
> > 
> > > background:
> > > I'm attempting to use 6.3.3 as a new base for one of our systems.
> > > Previously it was using 5.1.7 as a base.
> > > The uart in question is one of the two in the Cyclone V SOC HPS.
> > > And to muddy the waters the linux console TTYS0 is the other Uart from the
> > > same HPS core
> > > Yet the console appears to be working ok.
> > Maybe some of the DMA related changes triggering some unexpected behavior.
> > 
> > Console doesn't use DMA so that could explain the difference.
> > 
> > > Note all other libs and apps are at the same revision and build, it is
> > > only
> > > the kernel that is different.
> > > Both versions of the kernel are also built using the same bitbake bsdk..
> > > 
> > > 
> > > As you can see it looks like the frame thats received on the 6.3.3 kernel
> > > is
> > > mangled?
> > > This same message is just being requested over and over again from the GPS
> > > unit.
> > > 
> > > The offset where the tears occur looks to be pretty similar between each
> > > poll
> > > request.
> > > Usually the 1 at the end of the (75331 is where the first tear occurs.
> > > 
> > > I'd appreciate some quidance in how to track this down as there appears to
> > > have been a reasonable amount of work done to this driver and the serial
> > > core
> > > between these two versions.
> > A few ideas:
> > - try without dma_rx_complete() calling p->dma->rx_dma(p)
> > - revert 90b8596ac46043e4a782d9111f5b285251b13756
> > - Try the revert in
> > https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@xxxxxxxxxx/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
> >    (for e8ffbb71f783 and f8d6e9d3ca5c)
> > 
> > But finding the culprit with git bisect would be the most helpful here.
> > 
> Bisect wasn't an easy option as I'd applied a pile of patches after the
> interesting commits for our system to run.
> I'm not a git master :)
> 
> So I just started reverting the patches that had been applied to the 8250
> folder.
> Worked backwards from head.
> 
> After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
> serial: 8250_dma: Fix DMA Rx rearm race
> 
> Things started to work again.
> 
> I reset everything and then just reverted that individual patch and things
> work.
> So that looks like the culprit..

Okay, that also worked great, thanks a lot. It gives something concrete to 
work with to find a fix.

The commit itself looks very much correct, however, my guess is that when 
serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow 
does not become DMA_PAUSED which leads to not properly flushing DMA Rx 
transaction. Can you try the following debug patch (if my guess is 
correct, besides triggering the WARN_ON_ONCE, it also works around the 
issue):

[PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS inconsistency

Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
issued.

Not intended for upstream.

---
 drivers/tty/serial/8250/8250_dma.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_dma.c b/drivers/tty/serial/8250/8250_dma.c
index 7fa66501792d..3dedacd9f104 100644
--- a/drivers/tty/serial/8250/8250_dma.c
+++ b/drivers/tty/serial/8250/8250_dma.c
@@ -38,7 +38,7 @@ static void __dma_tx_complete(void *param)
 	spin_unlock_irqrestore(&p->port.lock, flags);
 }
 
-static void __dma_rx_complete(struct uart_8250_port *p)
+static void __dma_rx_complete(struct uart_8250_port *p, bool paused)
 {
 	struct uart_8250_dma	*dma = p->dma;
 	struct tty_port		*tty_port = &p->port.state->port;
@@ -52,8 +52,12 @@ static void __dma_rx_complete(struct uart_8250_port *p)
 	 * anything in such case.
 	 */
 	dma_status = dmaengine_tx_status(dma->rxchan, dma->rx_cookie, &state);
-	if (dma_status == DMA_IN_PROGRESS)
-		return;
+	if (dma_status == DMA_IN_PROGRESS) {
+		if (!paused)
+			return;
+
+		WARN_ON_ONCE(paused);
+	}
 
 	count = dma->rx_size - state.residue;
 
@@ -72,7 +76,7 @@ static void dma_rx_complete(void *param)
 
 	spin_lock_irqsave(&p->port.lock, flags);
 	if (dma->rx_running)
-		__dma_rx_complete(p);
+		__dma_rx_complete(p, false);
 
 	/*
 	 * Cannot be combined with the previous check because __dma_rx_complete()
@@ -172,7 +176,7 @@ void serial8250_rx_dma_flush(struct uart_8250_port *p)
 
 	if (dma->rx_running) {
 		dmaengine_pause(dma->rxchan);
-		__dma_rx_complete(p);
+		__dma_rx_complete(p, true);
 		dmaengine_terminate_async(dma->rxchan);
 	}
 }

-- 
tg: (ac9a78681b92..) debug/dma-paused (depends on: tty-next)

[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux