Quoting Chris Wilson (2020-05-05 14:48:19) > +static void await_proxy_work(struct work_struct *work) > +{ > + struct await_proxy *ap = container_of(work, typeof(*ap), work); > + struct i915_request *rq = ap->request; > + > + del_timer_sync(&ap->timer); > + > + if (ap->fence) { > + int err = 0; > + > + /* > + * If the fence is external, we impose a 10s timeout. > + * However, if the fence is internal, we skip a timeout in > + * the belief that all fences are in-order (DAG, no cycles) > + * and we can enforce forward progress by reset the GPU if > + * necessary. A future fence, provided userspace, can trivially > + * generate a cycle in the dependency graph, and so cause > + * that entire cycle to become deadlocked and for no forward > + * progress to either be made, and the driver being kept > + * eternally awake. > + * > + * While we do have a full DAG-verifier in the i915_sw_fence > + * debug code, that is perhaps prohibitiverly expensive > + * (and is necessarily global), so we replace that by > + * checking to see if the endpoints have a recorded cycle. > + */ > + if (dma_fence_is_i915(ap->fence)) { > + struct i915_request *signal = to_request(ap->fence); > + > + rcu_read_lock(); > + if (intel_timeline_sync_is_later(rcu_dereference(signal->timeline), > + &rq->fence)) { > + i915_sw_fence_set_error_once(&signal->submit, > + -EDEADLK); > + err = -EDEADLK; > + } > + rcu_read_unlock(); End points are not enough. It covers the trivial example I made for testing, but only that. I think for this to be safe we do need the full DAG verifier. Oh well, by the time Tvrtko has finished complaining about it being recursive it might not be so terrible. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx