Re: [PATCH] dmaengine: cppi41: Fix oops in cppi41_runtime_resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 01/12/2017 03:30 PM, Tony Lindgren wrote:
> Commit fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
> together with recent MUSB changes allowed USB and DMA on BeagleBone to idle
> when no cable is connected. But looks like few corner case issues still
> remain.
> 
> Looks like just by repluging USB cable about ten or so times on BeagleBone
> when configured in USB peripheral mode we can get warnings and eventually
> trigger an oops in cppi41 DMA:
> 
> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:1154 cppi41_runtime_suspend+
> x28/0x38 [cppi41]
> ...
> 
> WARNING: CPU: 0 PID: 14 at drivers/dma/cppi41.c:452
> push_desc_queue+0x94/0x9c [cppi41]
> ...
> 
> Unable to handle kernel NULL pointer dereference at virtual
> address 00000104
> pgd = c0004000
> [00000104] *pgd=00000000
> Internal error: Oops: 805 [#1] SMP ARM
> ...
> [<bf0d92cc>] (cppi41_runtime_resume [cppi41]) from [<c0589838>]
> (__rpm_callback+0xc0/0x214)
> [<c0589838>] (__rpm_callback) from [<c05899ac>] (rpm_callback+0x20/0x80)
> [<c05899ac>] (rpm_callback) from [<c0589460>] (rpm_resume+0x504/0x78c)
> [<c0589460>] (rpm_resume) from [<c058a1a0>] (pm_runtime_work+0x60/0xa8)
> [<c058a1a0>] (pm_runtime_work) from [<c0156120>] (process_one_work+0x2b4/0x808)
> 
> This is because of a race with PM runtime and cppi41_dma_issue_pending()
> as reported by Alexandre Bailon <abailon@xxxxxxxxxxxx> in earlier
> set of patches. Based on mailing list discussions we however came to the
> conclusion that a different fix from Alexandre's fix is needed in order
> to guarantee that DMA is really active when we try to use it.
> 
> To fix the issue, we need to add a driver specific flag as we otherwise
> can have -EINPROGRESS state set by PM runtime and can't rely on
> pm_runtime_active() to tell us when we can use the DMA.
> 
> For reference, this is also documented in Documentation/power/runtime_pm.txt
> in the example at the end of the file as pointed out by Grygorii Strashko
> <grygorii.strashko@xxxxxx>.
> 
> So let's fix the issue by introducing atomic_t active that gets toggled
> in runtime_suspend() and runtime_resume(). And then let's replace the
> pm_runtime_active() checks with atomic_read().
> 
> Note that we may also get transactions queued while in -EINPROGRESS
> state. So we need to check for new elements in cppi41_runtime_resume()
> by replacing list_for_each_entry_safe() with the commonly used test for
> while (!list_empty(&cdd->pending)).
> 
> Fixes: fdea2d09b997 ("dmaengine: cppi41: Add basic PM runtime support")
> Cc: Andy Shevchenko <andy.shevchenko@xxxxxxxxx>
> Cc: Bin Liu <b-liu@xxxxxx>
> Cc: Grygorii Strashko <grygorii.strashko@xxxxxx>
> Cc: Kevin Hilman <khilman@xxxxxxxxxxxx>
> Cc: Patrick Titiano <ptitiano@xxxxxxxxxxxx>
> Cc: Sergei Shtylyov <sergei.shtylyov@xxxxxxxxxxxxxxxxxx>
> Reported-by: Alexandre Bailon <abailon@xxxxxxxxxxxx>
> Signed-off-by: Tony Lindgren <tony@xxxxxxxxxxx>
> ---
>  drivers/dma/cppi41.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/dma/cppi41.c b/drivers/dma/cppi41.c
> --- a/drivers/dma/cppi41.c
> +++ b/drivers/dma/cppi41.c
> @@ -1,3 +1,4 @@
> +#include <linux/atomic.h>
>  #include <linux/delay.h>
>  #include <linux/dmaengine.h>
>  #include <linux/dma-mapping.h>
> @@ -153,6 +154,8 @@ struct cppi41_dd {
>  
>  	/* context for suspend/resume */
>  	unsigned int dma_tdfdq;
> +
> +	atomic_t active;
>  };
>  
>  #define FIST_COMPLETION_QUEUE	93
> @@ -320,7 +323,8 @@ static irqreturn_t cppi41_irq(int irq, void *data)
>  			int error;
>  
>  			error = pm_runtime_get(cdd->ddev.dev);
> -			if (error < 0)
> +			if (((error != -EINPROGRESS) && error < 0) ||
> +			    !atomic_read(&cdd->active))
>  				dev_err(cdd->ddev.dev, "%s pm runtime get: %i\n",
>  					__func__, error);
>  
> @@ -482,7 +486,7 @@ static void cppi41_dma_issue_pending(struct dma_chan *chan)
>  		return;
>  	}
>  

Sry, but even if it looks better it might still race with PM runtime :(

> -	if (likely(pm_runtime_active(cdd->ddev.dev)))
> +	if (likely(atomic_read(&cdd->active)))
>  		push_desc_queue(c);
>  	else


- CPU is here (-EINPROGRESS and active == 0) and then preempted 
- PM runtime will finish cppi41_runtime_resume and clean-up pending descs
- CPU return here and adds desc to the pending queue
- oops

Am I wrong?

>  		pending_desc(c);
> @@ -1003,6 +1007,7 @@ static int cppi41_dma_probe(struct platform_device *pdev)
>  	cdd->ddev.dst_addr_widths = CPPI41_DMA_BUSWIDTHS;
>  	cdd->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;
>  	cdd->ddev.dev = dev;
> +	atomic_set(&cdd->active, 0);
>  	INIT_LIST_HEAD(&cdd->ddev.channels);
>  	cpp41_dma_info.dma_cap = cdd->ddev.cap_mask;
>  
> @@ -1151,6 +1156,7 @@ static int __maybe_unused cppi41_runtime_suspend(struct device *dev)
>  {
>  	struct cppi41_dd *cdd = dev_get_drvdata(dev);
>  
> +	atomic_set(&cdd->active, 0);
>  	WARN_ON(!list_empty(&cdd->pending));
>  
>  	return 0;
> @@ -1159,13 +1165,20 @@ static int __maybe_unused cppi41_runtime_suspend(struct device *dev)
>  static int __maybe_unused cppi41_runtime_resume(struct device *dev)
>  {
>  	struct cppi41_dd *cdd = dev_get_drvdata(dev);
> -	struct cppi41_channel *c, *_c;
> +	struct cppi41_channel *c;
>  	unsigned long flags;
>  
> +	atomic_set(&cdd->active, 1);
> +
>  	spin_lock_irqsave(&cdd->lock, flags);
> -	list_for_each_entry_safe(c, _c, &cdd->pending, node) {
> -		push_desc_queue(c);
> +	while (!list_empty(&cdd->pending)) {
> +		c = list_first_entry(&cdd->pending,
> +				     struct cppi41_channel,
> +				     node);
>  		list_del(&c->node);
> +		spin_unlock_irqrestore(&cdd->lock, flags);
> +		push_desc_queue(c);
> +		spin_lock_irqsave(&cdd->lock, flags);
>  	}
>  	spin_unlock_irqrestore(&cdd->lock, flags);
>  
> 

-- 
regards,
-grygorii
--
To unsubscribe from this list: send the line "unsubscribe dmaengine" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux PCI]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux