Hi Tomi, Thank you for the patch. On Friday 19 February 2016 11:47:38 Tomi Valkeinen wrote: > A DMM timeout "timed out waiting for done" has been observed on DRA7 > devices. The timeout happens rarely, and only when the system is under > heavy load. > > Debugging showed that the timeout can be made to happen much more > frequently by optimizing the DMM driver, so that there's almost no code > between writing the last DMM descriptors to RAM, and writing to DMM > register which starts the DMM transaction. > > The current theory is that a wmb() does not properly ensure that the > data written to RAM is observable by all the components in the system. > > This DMM timeout has caused interesting (and rare) bugs as the error > handling was not functioning properly (the error handling has been fixed > in previous commits): > > * If a DMM timeout happened when a GEM buffer was being pinned for > display on the screen, a timeout error would be shown, but the driver > would continue programming DSS HW with broken buffer, leading to > SYNCLOST floods and possible crashes. > > * If a DMM timeout happened when other user (say, video decoder) was > pinning a GEM buffer, a timeout would be shown but if the user > handled the error properly, no other issues followed. > > * If a DMM timeout happened when a GEM buffer was being released, the > driver does not even notice the error, leading to crashes or hang > later. > > This patch adds wmb() and readl() calls after the last bit is written to > RAM, which should ensure that the execution proceeds only after the data > is actually in RAM, and thus observable by DMM. > > This patch is a HACK, as a read-back should not be needed. Further study > is required to understand if DMM is somehow special case and read-back > is ok, or if DRA7's memory barriers do not work correctly. CONFIG_SOC_DRA7XX selects OMAP_INTERCONNECT and OMAP_INTERCONNECT_BARRIER, but dra7xx_map_io() doesn't call omap_barriers_init(). Could that be the root cause of the issue ? I don't have access to a DRA7xx system, would you be able to test that ? > Signed-off-by: Tomi Valkeinen <tomi.valkeinen@xxxxxx> > --- > drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c index 80526dec7b2c..4e04f9487375 > 100644 > --- a/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > +++ b/drivers/gpu/drm/omapdrm/omap_dmm_tiler.c > @@ -262,6 +262,17 @@ static int dmm_txn_commit(struct dmm_txn *txn, bool > wait) } > > txn->last_pat->next_pa = 0; > + /* ensure that the written descriptors are visible to DMM */ > + wmb(); > + > + /* > + * NOTE: the wmb() above should be enough, but there seems to be a bug > + * in OMAP's memory barrier implementation, which in some rare cases may > + * cause the writes not to be observable after wmb(). > + */ > + > + /* read back to ensure the data is in RAM */ > + readl(&txn->last_pat->next_pa); > > /* write to PAT_DESCR to clear out any pending transaction */ > writel(0x0, dmm->base + reg[PAT_DESCR][engine->id]); -- Regards, Laurent Pinchart _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel