Re: [PULL REQUEST] ttm fence conversion

Christian König <deathsimple@xxxxxxxxxxx> · Tue, 02 Sep 2014 10:51:27 +0200

Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
Hey,

On 01-09-14 18:21, Christian König wrote:
Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
Hey,

Op 01-09-14 om 14:31 schreef Christian König:
Please wait a second with that.

I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:

+    wait_queue_t fence_wake;
Only a nitpick, but please fix the indention and maybe add a comment.

+    struct work_struct delayed_irq_work;
Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.
I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)

Then again, should be possible.

The fall back handler should take care of the rare condition that we 
can't activate the IRQ because the driver is in a lockup handler.

The issue is that the delayed_irq_work handler needs to take the 
exclusive lock once more and so would block an innocent process for the 
duration of the GPU lockup.

Either reschedule as delayed work item if we can't take the lock 
immediately or just live with the delay of the fall back handler. Since 
IRQs usually don't work correctly immediately after an GPU reset I'm 
pretty sure that the fallback handler will be needed anyway.

  /*
- * Cast helper
- */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
-
-/*
Please define the new cast helper in radeon.h as well.
The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.

Then define this as a function instead, I need a checked cast from a 
fence to a radeon_fence outside of the fence code as well.

      if (!rdev->needs_reset) {
-        up_write(&rdev->exclusive_lock);
+        downgrade_write(&rdev->exclusive_lock);
+        wake_up_all(&rdev->fence_queue);
+        up_read(&rdev->exclusive_lock);
          return 0;
      }
Just drop that as well, no need to wake up anybody here.
Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.

  downgrade_write(&rdev->exclusive_lock);
+    wake_up_all(&rdev->fence_queue);
Same here, the IB test will wake up all fences for recheck anyway.
Same as previous comment. :-)

+ * radeon_fence_read_seq - Returns the current fence value without updating
+ *
+ * @rdev: radeon_device pointer
+ * @ring: ring index to return the seqno of
+ */
+static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
+{
+    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
+    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
+    uint64_t seq = radeon_fence_read(rdev, ring);
+
+    seq = radeon_fence_read(rdev, ring);
+    seq |= last_seq & 0xffffffff00000000LL;
+    if (seq < last_seq) {
+        seq &= 0xffffffff;
+        seq |= last_emitted & 0xffffffff00000000LL;
+    }
+    return seq;
+}
Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.
Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.

Just check the cached value, it should be updated by 
radeon_fence_activity immediately before calling this anyway.

I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.

+        if (!ret)
+            FENCE_TRACE(&fence->base, "signaled from irq context\n");
+        else
+            FENCE_TRACE(&fence->base, "was already signaled\n");
Is all that text tracing necessary? Probably better define a trace point here.
It gets optimized out normally. There's already a tracepoint called in fence_signal.

+    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
+        !rdev->ddev->irq_enabled)
+        return false;
Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?
I thought irq_enabled couldn't happen under normal circumstances?

Not 100% sure, but I think it is temporary turned off during reset.

Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.
Thought so, well that's rather bad if we failed to install the IRQ 
handle that we just treat all fences as signaled isn't it?

+static signed long radeon_fence_default_wait(struct fence *f, bool intr,
+                         signed long timeout)
+{
+    struct radeon_fence *fence = to_radeon_fence(f);
+    struct radeon_device *rdev = fence->rdev;
+    bool signaled;
+
+    fence_enable_sw_signaling(&fence->base);
+
+    /*
+     * This function has to return -EDEADLK, but cannot hold
+     * exclusive_lock during the wait because some callers
+     * may already hold it. This means checking needs_reset without
+     * lock, and not fiddling with any gpu internals.
+     *
+     * The callback installed with fence_enable_sw_signaling will
+     * run before our wait_event_*timeout call, so we will see
+     * both the signaled fence and the changes to needs_reset.
+     */
+
+    if (intr)
+        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
+                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
+                               timeout);
+    else
+        timeout = wait_event_timeout(rdev->fence_queue,
+                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
+                         timeout);
+
+    if (timeout > 0 && !signaled)
+        return -EDEADLK;
+    return timeout;
+}
This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.
I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.

Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait.

Regards,
Christian.

~Maarten

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel