On Sat, Mar 22, 2014 at 06:52:25PM +0100, Daniel Vetter wrote: > On Fri, Mar 21, 2014 at 07:33:59PM +0200, Mika Kuoppala wrote: > > Hi, > > > > Daniel Vetter <daniel.vetter@xxxxxxxx> writes: > > > > > There's an entire pile of issues in here: > > > > > > - Use the main RING_HEAD register, not ACTHD. ACTHD points at the gtt > > > offset of the batch buffer when a batch is executed. Semaphores are > > > always emitted to the main ring, so we always want to look at that. > > > > The ipehr check should make sure that we are at the ring and > > acthd should not be at batch. > > > > Regardless I agree that RING_HEAD is much more safer. When I have > > tried to get bottom of the snb reset hang, I have noticed that > > after reset the acthd is sometimes 0x0c even tho head is 0x00, > > on snb. > > Hm, should we maybe mask out the lower bits, too? If you can confirm this, > can you please add a follow-up patch? > > > > - Mask the obtained HEAD pointer with the actual ring size, which is > > > much smaller. Together with the above issue this resulted us in > > > trying to dereference a pointer way outside of the ring mmio > > > mapping. The resulting invalid access in interrupt context > > > (hangcheck is executed from timers) lead to a full blown kernel > > > panic. The fbcon panic handler then tried to frob our driver harder, > > > resulting in a full machine hang at least on my snb here where I've > > > stumbled over this. > > > > > > - Handle ring wrapping correctly and be a bit more explicit about how > > > many dwords we're scanning. We probably should also scan more than > > > just 4 ... > > > > ipehr dont update on MI_NOOPS and the head should point to > > the extra MI_NOOP after semaphore sequence. So 4 should be > > enough. Perhaps revisit this when bdw gains semaphores. > > Yeah, Chris also mentioned that the HEAD should point at one dword past > the end of the ring, so should even work when there are no MI_NOOPs. In > any case this code is more robust in case hw engineers suddenly change the > behaviour. > > > > - Space out some of teh computations for readability. > > > > > > This prevents hard-hangs on my snb here. Verdict from QA is still > > > pending, but the symptoms match. > > > > > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > > Cc: Ben Widawsky <ben@xxxxxxxxxxxx> > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=74100 > > > > The hard hangs are not so frequent with this patch but > > they are still there. This patch should take care of > > the oops we have been seeing, but there is still > > something else to be found until #74100 can be marked as > > fixed. > > > > Very often after reset, when igt has pushed the quietence > > batches into rings, blitter and video ring heads gets > > moved properly but all updates to hardware status page and to > > the sync registers are missing. And result is hang by hangcheck. > > Repeat enough times and it is a hard hang. > > > > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxx> > > > > Please remove the blowup comment and also update the > > bugzilla tag. > > Yeah, QA also says that it doesn't fix the hard hangs, only seems to > reduce them a bit on certain setups. I've updated the commit message. > > btw are you testing with FBDEV=n? The lack of a fbcon panic handler should > greatly increase the chances that the last few message get correctly > transmitted through other means like netconsole. > > > Patches 1-2 and the followup one are > > > > Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > Thanks for the review, patches merged. > -Daniel Patch 2 was trivial to implement for gen8. This functionality is a lot less trivial. I volunteered to do patch 2, are you going to port this to gen8? -- Ben Widawsky, Intel Open Source Technology Center _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx