On Wed, Jul 23, 2014 at 8:52 AM, Christian König <christian.koenig@xxxxxxx> wrote: >> In the preliminary patches where I can sync radeon with other GPU's I've >> been very careful in all the places that call into fences, to make sure that >> radeon wouldn't try to handle lockups for a different (possibly also radeon) >> card. > > That's actually not such a good idea. > > In case of a lockup we need to handle the lockup cause otherwise it could > happen that radeon waits for the lockup to be resolved and the lockup > handling needs to wait for a fence that's never signaled because of the > lockup. I thought the plan for now is that each driver handles lookups themselfs for now. So if any batch gets stuck for too long (whether it's our own gpu that's stuck or whether we're somehow stuck on a fence from a 2nd gpu doesn't matter) the driver steps in with a reset and signals completion to all its own fences that have been in that pile-up. As long as each driver participating in fencing has means to abort/reset we'll eventually get unstuck. Essentially every driver has to guarantee that assuming dependent fences all complete eventually that it _will_ complete its own fences no matter what. For now this should be good enough, but for arb_robusteness or people who care a bit about their compute results we need reliable notification to userspace that a reset happened. I think we could add a new "aborted" fence state for that case and then propagate that. But given how tricky the code to compute reset victims in i915 is already I think we should leave this out for now. And even later on make it strictly opt-in. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel