On 23/02/16 23:13, Laurent Pinchart wrote: > Hi Tomi, > > Thank you for the patch. > > On Friday 19 February 2016 11:47:38 Tomi Valkeinen wrote: >> A DMM timeout "timed out waiting for done" has been observed on DRA7 >> devices. The timeout happens rarely, and only when the system is under >> heavy load. >> >> Debugging showed that the timeout can be made to happen much more >> frequently by optimizing the DMM driver, so that there's almost no code >> between writing the last DMM descriptors to RAM, and writing to DMM >> register which starts the DMM transaction. >> >> The current theory is that a wmb() does not properly ensure that the >> data written to RAM is observable by all the components in the system. >> >> This DMM timeout has caused interesting (and rare) bugs as the error >> handling was not functioning properly (the error handling has been fixed >> in previous commits): >> >> * If a DMM timeout happened when a GEM buffer was being pinned for >> display on the screen, a timeout error would be shown, but the driver >> would continue programming DSS HW with broken buffer, leading to >> SYNCLOST floods and possible crashes. >> >> * If a DMM timeout happened when other user (say, video decoder) was >> pinning a GEM buffer, a timeout would be shown but if the user >> handled the error properly, no other issues followed. >> >> * If a DMM timeout happened when a GEM buffer was being released, the >> driver does not even notice the error, leading to crashes or hang >> later. >> >> This patch adds wmb() and readl() calls after the last bit is written to >> RAM, which should ensure that the execution proceeds only after the data >> is actually in RAM, and thus observable by DMM. >> >> This patch is a HACK, as a read-back should not be needed. Further study >> is required to understand if DMM is somehow special case and read-back >> is ok, or if DRA7's memory barriers do not work correctly. > > CONFIG_SOC_DRA7XX selects OMAP_INTERCONNECT and OMAP_INTERCONNECT_BARRIER, but > dra7xx_map_io() doesn't call omap_barriers_init(). Could that be the root > cause of the issue ? I don't have access to a DRA7xx system, would you be able > to test that ? No idea, but I did dig up discussions about this in my mailbox, and it seems there's been some work done after I wrote this patch, in "Fix OMAP4 barrier support" series last summer. I'm not sure if that's only for OMAP4, though. I'll drop this patch too from the series, and spend a bit more time on it. This is again something that's a bit tricky to reproduce and test. Tomi
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel