Re: Possible regression in 8250_dw driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/05/2023 4:43 pm, Ilpo Järvinen wrote:

On Thu, 25 May 2023, Richard Tresidder wrote:

Richard Tresidder

**

On 24/05/2023 6:06 pm, Ilpo Järvinen wrote:

On Wed, 24 May 2023, Richard Tresidder wrote:
On 23/05/2023 7:13 pm, Ilpo Järvinen wrote:
On Tue, 23 May 2023, Richard Tresidder wrote:

background:
I'm attempting to use 6.3.3 as a new base for one of our systems.
Previously it was using 5.1.7 as a base.
The uart in question is one of the two in the Cyclone V SOC HPS.
And to muddy the waters the linux console TTYS0 is the other Uart from
the
same HPS core
Yet the console appears to be working ok.
Maybe some of the DMA related changes triggering some unexpected
behavior.

Console doesn't use DMA so that could explain the difference.

Note all other libs and apps are at the same revision and build, it is
only
the kernel that is different.
Both versions of the kernel are also built using the same bitbake
bsdk..


As you can see it looks like the frame thats received on the 6.3.3
kernel
is
mangled?
This same message is just being requested over and over again from the
GPS
unit.

The offset where the tears occur looks to be pretty similar between
each
poll
request.
Usually the 1 at the end of the (75331 is where the first tear occurs.

I'd appreciate some quidance in how to track this down as there
appears to
have been a reasonable amount of work done to this driver and the
serial
core
between these two versions.
A few ideas:
- try without dma_rx_complete() calling p->dma->rx_dma(p)
- revert 90b8596ac46043e4a782d9111f5b285251b13756
- Try the revert in
https://lore.kernel.org/all/316ab583-d217-a332-d161-8225b0cee227@xxxxxxxxxx/2-0001-Revert-serial-8250-use-THRE-__stop_tx-also-with-DMA.patch
     (for e8ffbb71f783 and f8d6e9d3ca5c)

But finding the culprit with git bisect would be the most helpful here.

Bisect wasn't an easy option as I'd applied a pile of patches after the
interesting commits for our system to run.
I'm not a git master :)

So I just started reverting the patches that had been applied to the 8250
folder.
Worked backwards from head.

After reverting 57e9af7831dcf211c5c689c2a6f209f4abdf0bce
serial: 8250_dma: Fix DMA Rx rearm race

Things started to work again.

I reset everything and then just reverted that individual patch and things
work.
So that looks like the culprit..
Okay, that also worked great, thanks a lot. It gives something concrete to
work with to find a fix.

The commit itself looks very much correct, however, my guess is that when
serial8250_rx_dma_flush() does dmaengine_pause() dma_status somehow
does not become DMA_PAUSED which leads to not properly flushing DMA Rx
transaction. Can you try the following debug patch (if my guess is
correct, besides triggering the WARN_ON_ONCE, it also works around the
issue):

[PATCH] DEBUG: 8250_dma: Detect DMA_PAUSED vs DMA_IN_PROGRESS inconsistency

Detect DMA state not returning DMA_PAUSED after dmaengine_pause() was
issued.

Not intended for upstream.
Thanks Ilpo
    I can confirm that the patch below works. Got the WARN_ON_ONCE dump so it's
taking that path.
I think the problem here is that the pl330 DMA controller thats in these SOC's
doesn't "really" support pause according to the driver.
For this flush usecase, it is enough to support pause + read residue +
terminate which is supported according to the function comment for
pl330_pause().

Yep agree given the 8250 serial code immediately terminates the dma after the pause during the flush anyway..

It doesn't look like it can ever set "DMA_PAUSED"
Why not? It pauses the transfer and is even able to allow reading the
transferred byte count.

What it is claimed to not be able to do is resume. Note that w/o resume
serial8250_tx_dma() XON/XOFF processing will not work but that's
unrelated to this issue (I'll probably need to do another patch to fix
that on 8250 DMA side).

Yep I just meant it doesn't look capable of reporting DMA_PAUSED
Which your patch below probably fixes.
and it kind of does a stop without a resume capability so must be terminated after this.
Though I'm not sure of the implications of reporting paused without resume capability.
Kind of an odd one..
I'll check your pl330 patch out tomorrow, and see what else might break.

RT

I'm not sure of the flow through of that though.
There is some commenting in the pl330 driver about the pause routine.
Maybe the patch below will help with pl330 DMA status (based on somewhat
limited understanding of all the moving parts):


[PATCH] dmaengine: pl330: Set DMA_PAUSED when pausing

pl330_pause() does not set anything to indicate paused condition which
causes pl330_tx_status() to return DMA_IN_PROGRESS. This breaks 8250
DMA code after the fix in commit 57e9af7831dc ("serial: 8250_dma: Fix
DMA Rx rearm race"). The function comment for pl330_pause() claims
pause is supported but resume is not which is enough for 8250 DMA flush
to work as long as DMA status is reported correctly when appropriate.

Add PAUSED state for descriptor and mark BUSY descriptors with PAUSED
in pl330_pause(). Return DMA_PAUSED from pl330_tx_status() when the
descriptor is PAUSED.

Fixes: 88987d2c7534 ("dmaengine: pl330: add DMA_PAUSE feature")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx>

---
  drivers/dma/pl330.c | 18 ++++++++++++++++--
  1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 0d9257fbdfb0..daad25f2c498 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -403,6 +403,12 @@ enum desc_status {
  	 * of a channel can be BUSY at any time.
  	 */
  	BUSY,
+	/*
+	 * Pause was called while descriptor was BUSY. Due to hardware
+	 * limitations, only termination is possible for descriptors
+	 * that have been paused.
+	 */
+	PAUSED,
  	/*
  	 * Sitting on the channel work_list but xfer done
  	 * by PL330 core
@@ -2041,7 +2047,7 @@ static inline void fill_queue(struct dma_pl330_chan *pch)
  	list_for_each_entry(desc, &pch->work_list, node) {
/* If already submitted */
-		if (desc->status == BUSY)
+		if (desc->status == BUSY || desc->status == PAUSED)
  			continue;
ret = pl330_submit_req(pch->thread, desc);
@@ -2326,6 +2332,7 @@ static int pl330_pause(struct dma_chan *chan)
  {
  	struct dma_pl330_chan *pch = to_pchan(chan);
  	struct pl330_dmac *pl330 = pch->dmac;
+	struct dma_pl330_desc *desc;
  	unsigned long flags;
pm_runtime_get_sync(pl330->ddma.dev);
@@ -2335,6 +2342,10 @@ static int pl330_pause(struct dma_chan *chan)
  	_stop(pch->thread);
  	spin_unlock(&pl330->lock);
+ list_for_each_entry(desc, &pch->work_list, node) {
+		if (desc->status == BUSY)
+			desc->status = PAUSED;
+	}
  	spin_unlock_irqrestore(&pch->lock, flags);
  	pm_runtime_mark_last_busy(pl330->ddma.dev);
  	pm_runtime_put_autosuspend(pl330->ddma.dev);
@@ -2425,7 +2436,7 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
  		else if (running && desc == running)
  			transferred =
  				pl330_get_current_xferred_count(pch, desc);
-		else if (desc->status == BUSY)
+		else if (desc->status == BUSY || desc->status == PAUSED)
  			/*
  			 * Busy but not running means either just enqueued,
  			 * or finished and not yet marked done
@@ -2442,6 +2453,9 @@ pl330_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
  			case DONE:
  				ret = DMA_COMPLETE;
  				break;
+			case PAUSED:
+				ret = DMA_PAUSED;
+				break;
  			case PREP:
  			case BUSY:
  				ret = DMA_IN_PROGRESS;




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux PPP]     [Linux FS]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Linmodem]     [Device Mapper]     [Linux Kernel for ARM]

  Powered by Linux