[PATCH RFC] dmaengine: dw: Prevent tx-status calling desc callback (Fix UART deadlock!)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The dmaengine_tx_status() method updating the DMA-descriptors state and
eventually calling the Tx-descriptors callback may potentially cause
problems. In particular the deadlock was discovered in DW UART 8250 device
interacting with DW DMA controller channels. The call-trace causing the
deadlock is:

serial8250_handle_irq()
  uart_port_lock_irqsave(port); ----------------------+
  handle_rx_dma()                                     |
    serial8250_rx_dma_flush()                         |
      __dma_rx_complete()                             |
        dmaengine_tx_status()                         |
          dwc_scan_descriptors()                      |
            dwc_complete_all()                        |
              dwc_descriptor_complete()               |
                dmaengine_desc_callback_invoke()      |
                  cb->callback(cb->callback_param);   |
                  ||                                  |
                  dma_rx_complete();                  |
                    uart_port_lock_irqsave(port); ----+ <- Deadlock!

So in case if the DMA-engine finished working at some point before the
serial8250_rx_dma_flush() invocation and the respective tasklet hasn't
been executed yet, then calling the dmaengine_tx_status() will cause the
DMA-descriptors status update and the Tx-descriptor callback invocation.
Generalizing the case up: if the dmaengine_tx_status() method callee and
the Tx-descriptor callback refer to the related critical section, then
calling dmaengine_tx_status() from the Tx-descriptor callback will
inevitably cause a deadlock around the guarding lock as it happens in the
Serial 8250 DMA implementation above. (Note the deadlock doesn't happen
very often, but can be eventually discovered if the being received data
size is greater than the Rx DMA-buffer size defined in the 8250_dma.c
driver. In my case reducing the Rx DMA-buffer size increased the deadlock
probability.)

The easiest way to fix the deadlock was to just remove the Tx-descriptors
state update from the DW DMA-engine Tx-descriptor status method
implementation, as the most of the DMA-engine drivers imply. After this
fix is applied the Tx-descriptors status will be only updated in the
framework of the dwc_scan_descriptors() method called from the tasklet
handling the deferred DMA-controller IRQ.

Signed-off-by: Serge Semin <fancer.lancer@xxxxxxxxx

---

Note I have doubts whether it's the best possible solution of the problem
since the client-driver deadlock is resolved here by fixing the DMA-engine
provider code. But I failed to find any reasonable solution in the 8250
DMA implementation.

Moreover the suggested fix cause a weird outcome - under the high-speed
and heavy serial transfers the next error is printed to the log sometimes:

> dma dma0chan0: BUG: XFER bit set, but channel not idle!

That's why the patch submitted as RFC. Do you have any better idea in mind
to prevent the nested lock?

Cc: Viresh Kumar <vireshk@xxxxxxxxxx>
Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx>
CC: Andy Shevchenko <andy@xxxxxxxxxx>
Cc: Vinod Koul <vkoul@xxxxxxxxxx>
Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
Cc: Jiri Slaby <jirislaby@xxxxxxxxxx>
Cc: "Ilpo Järvinen" <ilpo.jarvinen@xxxxxxxxxxxxxxx>
Cc: dmaengine@xxxxxxxxxxxxxxx
Cc: linux-serial@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
---
 drivers/dma/dw/core.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c
index 5f7d690e3dba..4b3402156eae 100644
--- a/drivers/dma/dw/core.c
+++ b/drivers/dma/dw/core.c
@@ -925,12 +925,6 @@ dwc_tx_status(struct dma_chan *chan,
 	struct dw_dma_chan	*dwc = to_dw_dma_chan(chan);
 	enum dma_status		ret;
 
-	ret = dma_cookie_status(chan, cookie, txstate);
-	if (ret == DMA_COMPLETE)
-		return ret;
-
-	dwc_scan_descriptors(to_dw_dma(chan->device), dwc);
-
 	ret = dma_cookie_status(chan, cookie, txstate);
 	if (ret == DMA_COMPLETE)
 		return ret;
-- 
2.43.0





[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux PCI]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux