Re: [PATCH 03/33] HACK: drm/omap: fix memory barrier bug in DMM driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tomi,

Thank you for the patch.

On Friday 19 February 2016 11:47:38 Tomi Valkeinen wrote:
> A DMM timeout "timed out waiting for done" has been observed on DRA7
> devices. The timeout happens rarely, and only when the system is under
> heavy load.
> 
> Debugging showed that the timeout can be made to happen much more
> frequently by optimizing the DMM driver, so that there's almost no code
> between writing the last DMM descriptors to RAM, and writing to DMM
> register which starts the DMM transaction.
> 
> The current theory is that a wmb() does not properly ensure that the
> data written to RAM is observable by all the components in the system.
> 
> This DMM timeout has caused interesting (and rare) bugs as the error
> handling was not functioning properly (the error handling has been fixed
> in previous commits):
> 
>  * If a DMM timeout happened when a GEM buffer was being pinned for
>    display on the screen, a timeout error would be shown, but the driver
>    would continue programming DSS HW with broken buffer, leading to
>    SYNCLOST floods and possible crashes.
> 
>  * If a DMM timeout happened when other user (say, video decoder) was
>    pinning a GEM buffer, a timeout would be shown but if the user
>    handled the error properly, no other issues followed.
> 
>  * If a DMM timeout happened when a GEM buffer was being released, the
>    driver does not even notice the error, leading to crashes or hang
>    later.
> 
> This patch adds wmb() and readl() calls after the last bit is written to
> RAM, which should ensure that the execution proceeds only after the data
> is actually in RAM, and thus observable by DMM.
> 
> This patch is a HACK, as a read-back should not be needed. Further study
> is required to understand if DMM is somehow special case and read-back
> is ok, or if DRA7's memory barriers do not work correctly.

CONFIG_SOC_DRA7XX selects OMAP_INTERCONNECT and OMAP_INTERCONNECT_BARRIER, but 
dra7xx_map_io() doesn't call omap_barriers_init(). Could that be the root 
cause of the issue ? I don't have access to a DRA7xx system, would you be able 
to test that ?

> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@xxxxxx>
> ---
>  drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c index 80526dec7b2c..4e04f9487375
> 100644
> --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c
> @@ -262,6 +262,17 @@ static int dmm_txn_commit(struct dmm_txn *txn, bool
> wait) }
> 
>  	txn->last_pat->next_pa = 0;
> +	/* ensure that the written descriptors are visible to DMM */
> +	wmb();
> +
> +	/*
> +	 * NOTE: the wmb() above should be enough, but there seems to be a bug
> +	 * in OMAP's memory barrier implementation, which in some rare cases may
> +	 * cause the writes not to be observable after wmb().
> +	 */
> +
> +	/* read back to ensure the data is in RAM */
> +	readl(&txn->last_pat->next_pa);
> 
>  	/* write to PAT_DESCR to clear out any pending transaction */
>  	writel(0x0, dmm->base + reg[PAT_DESCR][engine->id]);

-- 
Regards,

Laurent Pinchart

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux