On Fri, Jun 26, 2015 at 01:58:11PM +0100, John.C.Harrison@xxxxxxxxx wrote: > From: John Harrison <John.C.Harrison@xxxxxxxxx> > > The intended usage model for struct fence is that the signalled status should be > set on demand rather than polled. That is, there should not be a need for a > 'signaled' function to be called everytime the status is queried. Instead, > 'something' should be done to enable a signal callback from the hardware which > will update the state directly. In the case of requests, this is the seqno > update interrupt. The idea is that this callback will only be enabled on demand > when something actually tries to wait on the fence. > > This change removes the polling test and replaces it with the callback scheme. > Each fence is added to a 'please poke me' list at the start of > i915_add_request(). The interrupt handler then scans through the 'poke me' list > when a new seqno pops out and signals any matching fence/request. The fence is > then removed from the list so the entire request stack does not need to be > scanned every time. Note that the fence is added to the list before the commands > to generate the seqno interrupt are added to the ring. Thus the sequence is > guaranteed to be race free if the interrupt is already enabled. > > One complication here is that the 'poke me' system requires holding a reference > count on the request to guarantee that it won't be freed prematurely. > Unfortunately, it is unsafe to decrement the reference count from the interrupt > handler because if that is the last reference, the clean up code gets run and > the clean up code is not IRQ friendly. Hence, the request is added to a 'please > clean me' list that gets processed at retire time. Any request in this list > simply has its count decremented and is then removed from that list. > > Note that the interrupt is only enabled on demand (i.e. when __wait_request() is > called). Thus there is still a potential race when enabling the interrupt as the > request may already have completed. However, this is simply solved by calling > the interrupt processing code immediately after enabling the interrupt and > thereby checking for already completed requests. > > Lastly, the ring clean up code has the possibility to cancel outstanding > requests (e.g. because TDR has reset the ring). These requests will never get > signalled and so must be removed from the signal list manually. This is done by > setting a 'cancelled' flag and then calling the regular notify/retire code path > rather than attempting to duplicate the list manipulatation and clean up code in > multiple places. This also avoid any race condition where the cancellation > request might occur after/during the completion interrupt actually arriving. -nightly nop: Time to exec x 1: 15.000µs (ring=render) Time to exec x 1: 2.000µs (ring=blt) Time to exec x 131072: 1.827µs (ring=render) Time to exec x 131072: 1.555µs (ring=blt) rq tuning patches nop: Time to exec x 1: 12.200µs (ring=render) Time to exec x 1: 1.600µs (ring=blt) Time to exec x 131072: 1.516µs (ring=render) Time to exec x 131072: 0.812µs (ring=blt) interrupt driven nop: Time to exec x 1: 19.200µs (ring=render) Time to exec x 1: 5.200µs (ring=blt) Time to exec x 131072: 2.381µs (ring=render) Time to exec x 131072: 2.009µs (ring=blt) So the basic question that is left unanswered from last time is why would we want to slow down __i915_wait_request? And enabling IRQs still generates very high system load when processing the 30-40k IRQs per second found under some workloads. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx