Hi folks Any comments or suggestion about the change? The kernel occasionally _deadlocks_ without it for the DW UART + DW DMAC hardware setup. -Serge(y) On Fri, Aug 02, 2024 at 11:07:51AM +0300, Serge Semin wrote: > The dmaengine_tx_status() method updating the DMA-descriptors state and > eventually calling the Tx-descriptors callback may potentially cause > problems. In particular the deadlock was discovered in DW UART 8250 device > interacting with DW DMA controller channels. The call-trace causing the > deadlock is: > > serial8250_handle_irq() > uart_port_lock_irqsave(port); ----------------------+ > handle_rx_dma() | > serial8250_rx_dma_flush() | > __dma_rx_complete() | > dmaengine_tx_status() | > dwc_scan_descriptors() | > dwc_complete_all() | > dwc_descriptor_complete() | > dmaengine_desc_callback_invoke() | > cb->callback(cb->callback_param); | > || | > dma_rx_complete(); | > uart_port_lock_irqsave(port); ----+ <- Deadlock! > > So in case if the DMA-engine finished working at some point before the > serial8250_rx_dma_flush() invocation and the respective tasklet hasn't > been executed yet, then calling the dmaengine_tx_status() will cause the > DMA-descriptors status update and the Tx-descriptor callback invocation. > Generalizing the case up: if the dmaengine_tx_status() method callee and > the Tx-descriptor callback refer to the related critical section, then > calling dmaengine_tx_status() from the Tx-descriptor callback will > inevitably cause a deadlock around the guarding lock as it happens in the > Serial 8250 DMA implementation above. (Note the deadlock doesn't happen > very often, but can be eventually discovered if the being received data > size is greater than the Rx DMA-buffer size defined in the 8250_dma.c > driver. In my case reducing the Rx DMA-buffer size increased the deadlock > probability.) > > The easiest way to fix the deadlock was to just remove the Tx-descriptors > state update from the DW DMA-engine Tx-descriptor status method > implementation, as the most of the DMA-engine drivers imply. After this > fix is applied the Tx-descriptors status will be only updated in the > framework of the dwc_scan_descriptors() method called from the tasklet > handling the deferred DMA-controller IRQ. > > Signed-off-by: Serge Semin <fancer.lancer@xxxxxxxxx > > --- > > Note I have doubts whether it's the best possible solution of the problem > since the client-driver deadlock is resolved here by fixing the DMA-engine > provider code. But I failed to find any reasonable solution in the 8250 > DMA implementation. > > Moreover the suggested fix cause a weird outcome - under the high-speed > and heavy serial transfers the next error is printed to the log sometimes: > > > dma dma0chan0: BUG: XFER bit set, but channel not idle! > > That's why the patch submitted as RFC. Do you have any better idea in mind > to prevent the nested lock? > > Cc: Viresh Kumar <vireshk@xxxxxxxxxx> > Cc: Andy Shevchenko <andriy.shevchenko@xxxxxxxxxxxxxxx> > CC: Andy Shevchenko <andy@xxxxxxxxxx> > Cc: Vinod Koul <vkoul@xxxxxxxxxx> > Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Cc: Jiri Slaby <jirislaby@xxxxxxxxxx> > Cc: "Ilpo Järvinen" <ilpo.jarvinen@xxxxxxxxxxxxxxx> > Cc: dmaengine@xxxxxxxxxxxxxxx > Cc: linux-serial@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > --- > drivers/dma/dw/core.c | 6 ------ > 1 file changed, 6 deletions(-) > > diff --git a/drivers/dma/dw/core.c b/drivers/dma/dw/core.c > index 5f7d690e3dba..4b3402156eae 100644 > --- a/drivers/dma/dw/core.c > +++ b/drivers/dma/dw/core.c > @@ -925,12 +925,6 @@ dwc_tx_status(struct dma_chan *chan, > struct dw_dma_chan *dwc = to_dw_dma_chan(chan); > enum dma_status ret; > > - ret = dma_cookie_status(chan, cookie, txstate); > - if (ret == DMA_COMPLETE) > - return ret; > - > - dwc_scan_descriptors(to_dw_dma(chan->device), dwc); > - > ret = dma_cookie_status(chan, cookie, txstate); > if (ret == DMA_COMPLETE) > return ret; > -- > 2.43.0 >