On 29/01/2020 00:12, Dmitry Osipenko wrote: > 28.01.2020 17:51, Dmitry Osipenko пишет: >> 28.01.2020 17:02, Jon Hunter пишет: >>> >>> On 16/01/2020 20:10, Dmitry Osipenko wrote: >>>> 15.01.2020 12:00, Jon Hunter пишет: >>>>> >>>>> On 14/01/2020 20:33, Dmitry Osipenko wrote: >>>>>> 14.01.2020 18:09, Jon Hunter пишет: >>>>>>> >>>>>>> On 12/01/2020 17:29, Dmitry Osipenko wrote: >>>>>>>> I was doing some experiments with I2C and noticed that Tegra APB DMA >>>>>>>> driver crashes sometime after I2C DMA transfer termination. The crash >>>>>>>> happens because tegra_dma_terminate_all() bails out immediately if pending >>>>>>>> list is empty, thus it doesn't release the half-completed descriptors >>>>>>>> which are getting re-used before ISR tasklet kicks-in. >>>>>>> >>>>>>> Can you elaborate a bit more on how these are getting re-used? What is >>>>>>> the sequence of events which results in the panic? I believe that this >>>>>>> was also reported in the past [0] and so I don't doubt there is an issue >>>>>>> here, but would like to completely understand this. >>>>>>> >>>>>>> Thanks! >>>>>>> Jon >>>>>>> >>>>>>> [0] https://lore.kernel.org/patchwork/patch/675349/ >>>>>>> >>>>>> >>>>>> In my case it happens in the touchscreen driver during of the >>>>>> touchscreen's interrupt handling (in a threaded IRQ handler) + CPU is >>>>>> under load and there is other interrupts activity. So what happens here >>>>>> is that the TS driver issues one I2C transfer, which fails with >>>>>> (apparently bogus) timeout (because DMA descriptor is completed and >>>>>> removed from the pending list, but tasklet not executed yet), and then >>>>>> TS immediately issues another I2C transfer that re-uses the >>>>>> yet-incompleted descriptor. That's my understanding. >>>>> >>>>> OK, but what is the exact sequence that it allowing it to re-use the >>>>> incompleted descriptor? >>>> >>>> TDMA driver DMA Client >>>> >>>> 1. >>>> dmaengine_prep() >>>> >>>> 2. >>>> tegra_dma_desc_get() >>>> dma_desc = kzalloc() >>>> ... >>>> tegra_dma_prep_slave_sg() >>>> INIT_LIST_HEAD(&dma_desc->tx_list); >>>> INIT_LIST_HEAD(&dma_desc->cb_node); >>>> list_add_tail(sgreq->node, >>>> dma_desc->tx_list) >>>> >>>> 3. >>>> dma_async_issue_pending() >>>> >>>> 4. >>>> tegra_dma_tx_submit() >>>> list_splice_tail_init(dma_desc->tx_list, >>>> tdc->pending_sg_req) >>>> >>>> 5. >>>> tegra_dma_isr() >>>> ... >>>> handle_once_dma_done() >>>> ... >>>> sgreq = list_first_entry(tdc->pending_sg_req) >>>> list_del(sgreq->node); >>>> ... >>>> list_add_tail(dma_desc->cb_node, >>>> tdc->cb_desc); >>>> list_add_tail(dma_desc->node, >>>> tdc->free_dma_desc); >>> >>> Isn't this the problem here, that we have placed this on the free list >>> before we are actually done? >>> >>> It seems to me that there could still be a potential race condition >>> between the ISR and the tasklet running. >> >> Yes, this should be addressed by the patch #3 "dmaengine: tegra-apb: >> Prevent race conditions of tasklet vs free list". > > correction (to avoid confusion): it's actually patch #5, my bad Ah OK great! Could be worth making that patch #1 or #2 in the series as theses are somewhat related. Cheers Jon -- nvpublic