Re: [PATCH v2] drm/i915: Apply a cond_resched() to the saturated signaler

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 04/04/2017 13:05, Chris Wilson wrote:
If the engine is continually completing nops, we can saturate the
signaler and keep it working indefinitely. This angers the NMI watchdog!

A good example is to disable semaphores on snb and run igt/gem_exec_nop -
the parallel, multi-engine workloads are more than sufficient to hog the
CPU, preventing the system from even processing ICMP echo replies.

v2: Tvrtko dug into cond_resched() on x86 and found that it only
depended upon preempt_cound and not tif_need_resched() - which means
that we would always call schedule() at that point.

Fixes: c81d46138da6 ("drm/i915: Convert trace-irq to the breadcrumb waiter")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
 drivers/gpu/drm/i915/intel_breadcrumbs.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 308c56a021ab..9ccbf26124c6 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -580,6 +580,8 @@ static int intel_breadcrumbs_signaler(void *arg)
 	signaler_set_rtpriority();

 	do {
+		bool do_schedule = true;
+
 		set_current_state(TASK_INTERRUPTIBLE);

 		/* We are either woken up by the interrupt bottom-half,
@@ -626,7 +628,18 @@ static int intel_breadcrumbs_signaler(void *arg)
 			spin_unlock_irq(&b->rb_lock);

 			i915_gem_request_put(request);
-		} else {
+
+			/* If the engine is saturated we may be continually
+			 * processing completed requests. This angers the
+			 * NMI watchdog if we never let anything else
+			 * have access to the CPU. Let's pretend to be nice
+			 * and relinquish the CPU if we burn through the
+			 * entire RT timeslice!
+			 */
+			do_schedule = need_resched();
+		}
+
+		if (unlikely(do_schedule)) {
 			DEFINE_WAIT(exec);

 			if (kthread_should_park())


My thinking was to add a check before the request assignment like:

	rcu_read_lock();
	request = ...;
	if (request && !need_resched())
		request = ...;
	else
		request = NULL;
	rcu_read_unlock();

But this looks correct as well, maybe it is just my preference on what would have been easier to understand.

I trust that you have tested it both for solving the NMI lockup detector issue and that it doesn't affect the signaller latency a lot.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux