Thank Sven, please have a try with the below patch. I'll send it to review later. The root cause is that channel0 done interrupt may come later after after sdma clock disabled(sdma_load_firmware()), which means clearing channel0 interrupt status in ISR never work, thus infinite interrupt comes out. But if delay the firmware load behind any other driver using sdma probe, the issue is gone because sdma clock enabled again in sdma_alloc_chan_resource() such as SPI driver. Actually, no need trigger interrupt for channel0 since SDMA_H_STATSTOP register already be checked instead. diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c index deea9aa..b5a1ee2 100644 --- a/drivers/dma/imx-sdma.c +++ b/drivers/dma/imx-sdma.c @@ -742,7 +742,7 @@ static int sdma_load_script(struct sdma_engine *sdma, void *buf, int size, spin_lock_irqsave(&sdma->channel_0_lock, flags); bd0->mode.command = C0_SETPM; - bd0->mode.status = BD_DONE | BD_INTR | BD_WRAP | BD_EXTD; + bd0->mode.status = BD_DONE | BD_WRAP | BD_EXTD; bd0->mode.count = size / 2; bd0->buffer_addr = buf_phys; bd0->ext_buffer_addr = address; @@ -1064,7 +1064,7 @@ static int sdma_load_context(struct sdma_channel *sdmac) context->gReg[7] = sdmac->watermark_level; bd0->mode.command = C0_SETDM; - bd0->mode.status = BD_DONE | BD_INTR | BD_WRAP | BD_EXTD; + bd0->mode.status = BD_DONE | BD_WRAP | BD_EXTD; bd0->mode.count = sizeof(*context) / 4; bd0->buffer_addr = sdma->context_phys; bd0->ext_buffer_addr = 2048 + (sizeof(*context) / 4) * channel; On 2019-06-11 at 17:30 +0000, Sven Van Asbroeck wrote: > On Tue, Jun 11, 2019 at 5:09 AM Robin Gong <yibin.gong@xxxxxxx> > wrote: > > > > > > Sven, no any dependency from sdma driver view. The only difference > > between directly loading firmware > > from kernel and rootfs is the former spend more time during kernel > > boot and such timing may cause > > the crash. The issue is not 100% in my side, about 20% possibility, > > which looks like 'timing issue' . Another > > interesting thing is that every time the crash stop at somewhere > > drm, and After I disable ipu and display > > which use drm in i.mx6q.dtsi, the issue is gone on my i.mx6q- > > sabreauto board. > > Could you have a try with below patch as mine? If the issue is gone > > on your side, we could involve drm guys to > > look into it. > When I apply your patch to ipu and display, the crash still happens > on > my device. > But when I disable NFSv4 network filesystem in defconfig, the crash > disappears. > Yet on linux-next, the crash is there again, even if I disable the > IPU or NFSv4. > > My guess: we are chasing ghosts, the crashes are purely timing > related. Things > like disabling the IPU or NFSv4 change boot timing, and this changes > the crash. > > Experiment: If I put msleep(1000) right before the sdma_load_script() > call, then > the crash never happens. And if I comment out the call to > sdma_run_channel0() > in sdma_load_script(), then the crash also does not happen. > > This suggests that the crash is related to the exact timing when > sdma_run_channel0() is called. If it is called too early, this > results > in an 'interrupt storm' on the sdma interrupt handler: it gets called > millions of times in a very short amount of time. > > By adding debug prints, I noticed that the sdma core calls back > sdma_alloc_chan_resources(), later during the boot, when a spi > bus is created. > > Experiment: I paused firmware upload until the first time > sdma_alloc_chan_resources() is called by the core. > I used a struct completion to accomplish this. > > Result: the crash never happens again. > > All this suggests very strongly that sdma_run_channel0() is called > "too early" by the driver. I don't known enough of imx-sdma to > know what is missing during the early call. > > Here is the patch to delay firmware load until the first > sdma_alloc_chan_resources() has completed: > > diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c > index 99d9f431ae2c..ddeded5c3337 100644 > --- a/drivers/dma/imx-sdma.c > +++ b/drivers/dma/imx-sdma.c > @@ -33,6 +33,7 @@ > #include <linux/of_device.h> > #include <linux/of_dma.h> > #include <linux/workqueue.h> > +#include <linux/completion.h> > > #include <asm/irq.h> > #include <linux/platform_data/dma-imx-sdma.h> > @@ -444,6 +445,7 @@ struct sdma_engine { > struct sdma_buffer_descriptor *bd0; > /* clock ratio for AHB:SDMA core. 1:1 is 1, 2:1 is 0*/ > bool clk_ratio; > + struct completion chan_resources_alloced; > }; > > static int sdma_config_write(struct dma_chan *chan, > @@ -1258,6 +1260,7 @@ static void sdma_desc_free(struct virt_dma_desc > *vd) > static int sdma_alloc_chan_resources(struct dma_chan *chan) > { > struct sdma_channel *sdmac = to_sdma_chan(chan); > + struct sdma_engine *sdma = sdmac->sdma; > struct imx_dma_data *data = chan->private; > struct imx_dma_data mem_data; > int prio, ret; > @@ -1310,6 +1313,7 @@ static int sdma_alloc_chan_resources(struct > dma_chan *chan) > if (ret) > goto disable_clk_ahb; > > + complete(&sdma->chan_resources_alloced); > return 0; > > disable_clk_ahb: > @@ -1724,6 +1728,7 @@ static void sdma_load_firmware(const struct > firmware *fw, void *context) > /* In this case we just use the ROM firmware. */ > return; > } > + wait_for_completion(&sdma->chan_resources_alloced); > > if (fw->size < sizeof(*header)) > goto err_firmware; > @@ -2012,6 +2017,7 @@ static int sdma_probe(struct platform_device > *pdev) > return -ENOMEM; > > spin_lock_init(&sdma->channel_0_lock); > + init_completion(&sdma->chan_resources_alloced); > > sdma->dev = &pdev->dev; > sdma->drvdata = drvdata;