Re: [RFC PATCH v1 08/16] drm/radeon: use common fence implementation for fences

Christian König <deathsimple@xxxxxxxxxxx> · Thu, 15 May 2014 11:42:21 +0200

Am 15.05.2014 11:38, schrieb Maarten Lankhorst:
op 15-05-14 11:21, Christian König schreef:
Am 15.05.2014 03:06, schrieb Maarten Lankhorst:
op 14-05-14 17:29, Christian König schreef:
+    /* did fence get signaled after we enabled the sw irq? */
+    if 
(atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq) >= 
fence->seq) {
+        radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+        return false;
+    }
+
+    fence->fence_wake.flags = 0;
+    fence->fence_wake.private = NULL;
+    fence->fence_wake.func = radeon_fence_check_signaled;
+    __add_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+    fence_get(f);
That looks like a race condition to me. The fence needs to be added 
to the wait queue before the check, not after.

Apart from that the whole approach looks like a really bad idea to 
me. How for example is lockup detection supposed to happen with this? 
It's not a race condition because fence_queue.lock is held when this 
function is called.
Ah, I see. That's also the reason why you moved the wake_up_all out 
of the processing function.
Correct. :-)
Lockup's a bit of a weird problem, the changes wouldn't allow core 
ttm code to handle the lockup any more,
but any driver specific wait code would still handle this. I did 
this by design, because in future patches the wait
function may be called from outside of the radeon driver. The 
official wait function takes a timeout parameter,
so lockups wouldn't be fatal if the timeout is set to something like 
30*HZ for example, it would still return
and report that the function timed out.
Timeouts help with the detection of the lockup, but not at all with 
the handling of them.

What we essentially need is a wait callback into the driver that is 
called in non atomic context without any locks held.

This way we can block for the fence to become signaled with a timeout 
and can then also initiate the reset handling if necessary.

The way you designed the interface now means that the driver never 
gets a chance to wait for the hardware to become idle and so never 
has the opportunity to the reset the whole thing.
You could set up a hangcheck timer like intel does, and end up with a 
reliable hangcheck detection that doesn't depend on cpu waits. :-) Or 
override the default wait function and restore the old behavior.

Overriding the default wait function sounds better, please implement it 
this way.

Thanks,
Christian.

~Maarten

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel