On Wed, Jan 13, 2016 at 07:04:53PM +0200, Mika Kuoppala wrote: > Hangcheck is run on irq context and might be active on a s/irq/process/ that we pretend is irq-like. > completely different CPU that is submitting requests. And as > we have been very careful not to add locking to hangcheck to guard > against driver failures, we need to be careful with the coherency. > > Update ring last seqno and add to request lists early, with > write memory barrier, before pushing the request to ring. On > hangcheck side, use read memory barrier before inspecting the ring > last seqno. This ensures that when hangcheck is inspecting the > state, it will not see a active ring without requests in it, > and then falsely report it as a missed irq situation. You are just moving the race, not closing it afaict. > ring_idle(struct intel_engine_cs *ring, u32 seqno) > { > - return (list_empty(&ring->request_list) || > - i915_seqno_passed(seqno, ring->last_submitted_seqno)); > + rmb(); /* Ordering against __i915_add_request() */ > + return (i915_seqno_passed(seqno, ring->last_submitted_seqno) > + || list_empty(&ring->request_list)); But we can get rid of list_empty()! If we just embrace the race, we can do this simply by checking for the same idle waiter twice. http://lists.freedesktop.org/archives/intel-gfx/2016-January/084514.html which is the most recent variant. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx