Re: [PATCH 19/19] drm/i915: Sync against the GuC log buffer flush work item on system suspend

"Goel, Akash" <akash.goel@xxxxxxxxx> · Thu, 18 Aug 2016 19:17:07 +0530






On 8/18/2016 6:29 PM, Imre Deak wrote:
On to, 2016-08-18 at 16:54 +0530, Goel, Akash wrote:

On 8/18/2016 4:25 PM, Imre Deak wrote:
On to, 2016-08-18 at 09:15 +0530, Goel, Akash wrote:

On 8/17/2016 9:07 PM, Goel, Akash wrote:


On 8/17/2016 6:41 PM, Imre Deak wrote:
On ke, 2016-08-17 at 18:15 +0530, Goel, Akash wrote:

On 8/17/2016 5:11 PM, Chris Wilson wrote:
On Wed, Aug 17, 2016 at 12:27:30PM +0100, Tvrtko Ursulin
wrote:


+int intel_guc_suspend(struct drm_device *dev, bool
rpm_suspend)
 {
     struct drm_i915_private *dev_priv = to_i915(dev);
     struct intel_guc *guc = &dev_priv->guc;
@@ -1530,6 +1530,12 @@ int intel_guc_suspend(struct
drm_device *dev)
         return 0;

     gen9_disable_guc_interrupts(dev_priv);
+    /* Sync is needed only for the system suspend case,
runtime
suspend
+     * case is covered due to rpm get/put calls used
around Hw
access in
+     * the work item function.
+     */
+    if (!rpm_suspend && (i915.guc_log_level >= 0))
+        flush_work(&dev_priv->guc.log.flush_work);

In which case (rpm suspend) the flush_work is idle and this a
noop.
That
you have to pass around such state suggests that you are
papering
over a
bug?
In case of rpm suspend the flush_work may not be a NOOP.
Can use the flush_work for runtime suspend also but in spite of
that
can't prevent the 'RPM wakelock' asserts, as the work item can
get
executed after the rpm ref count drops to zero and before
runtime
suspend kicks in (after autosuspend delay).

For that you had earlier suggested to use rpm get/put in the
work item
function, around the register access, but with that had to
remove the
flush_work from the suspend hook, otherwise a deadlock can
happen.
So doing the flush_work conditionally for system suspend case,
as rpm
get/put won't cause the resume of device in that case.

Actually I had discussed about this with Imre and as per his
inputs
prepared this patch.

There would be this alternative:

Thanks much for suggesting the alternate approach.

Just to confirm whether I understood everything correctly,

in gen9_guc_irq_handler():
   WARN_ON(!intel_runtime_pm_get_if_in_use());
Used WARN, as we don't expect the device to be suspended at this
juncture, so intel_runtime_pm_get_if_in_use() should return true.

   if (!queue_work(log.flush_work))
If queue_work returns 0, then work item is already pending, so it
won't
be queued hence can release the rpm ref count now only.
       intel_runtime_pm_put();


and dropping the reference at the end of the work item.
This will be just like the __intel_autoenable_gt_powersave

This would make the flush_work() a nop in case of
runtime_suspend().
So can call the flush_work unconditionally.

Hope I understood it correctly.

Yes, the above is correct except for my mistake in
handling intel_runtime_pm_get_if_in_use() returning false as discussed
below.


Hi Imre,

You had suggested to use the below code from irq handler, suspecting
that intel_runtime_pm_get_if_in_use() can return false, if interrupt
gets handled just after device goes out of use.

	if (intel_runtime_pm_get_if_in_use()) {
		if (!queue_work(log.flush_work))
			intel_runtime_pm_put();
	}

Do you mean to say that interrupt can come when rpm suspend has
already
started but before the interrupt is disabled from the suspend hook ?
Like if interrupt comes b/w 1) & 4), then runtime_pm_get_if_in_use()
will return false.
1)	Autosuspend delay elapses (device is marked as suspending)
2)		intel_runtime_suspend
3)			intel_guc_suspend
4)				gen9_disable_guc_interrupts(dev_pri
v);

No, it can return false anytime the last RPM reference is dropped, that
is even before the autosuspend delay elapses.

Sorry I missed that pm_runtime_get_if_in_use() will return 0 if RPM ref
count has dropped to 0, even if device is still in runtime active state
(as autosuspend delay has not elapsed).

 > But that still makes the
likelihood for a missed work item scheduling small, because 1) we want
to reduce the autosuspend delay considerably from the current 10 sec
and 2) because what you say below about the GPU actually idling before
the RPM refcount going to 0.

If the above hypothesis is correct, then it implies that interrupt
has
to come after autosuspend delay has elapsed for the above scenario to
arise.

I think it would be unlikely for the interrupt to come so late
because
device would have gone idle just before the autosuspend period
started
and so no GuC submissions would have been done after that.

Right.

So the probability of missing a work item could be very less and we
can bear that.

I haven't looked into what is the consequence of missing a work item,
you know this better. In any case - since it is still a possibility -
if it's a problem you could still make sure in intel_guc_suspend() that
any pending work is completed by calling guc_read_update_log_buffer(),
host2guc_logbuffer_flush_complete() if necessary after disabling
interrupts in intel_guc_suspend().
Actually ideally guc_read_update_log_buffer() and
host2guc_logbuffer_flush_complete() should be called only if the work
item was actually missed. So will have to detect the missing of work item.

Ok. But note that missing an interrupt when runtime suspending is not
unimaginable in any case, since interrupts can get disabled (and
cleared) before they would get serviced.

Fine, missing of interrupt will always be a possibility.

Isn't the original implementation i.e. conditional flushing of work item
for the system suspend case, a simpler & cleaner solution ?

Yes, perhaps, especially with the missed work item detection. How about
making the log.flush_wq freezable? Then we could forgo the flush in
both runtime and system suspend.

Thanks for the inputs. Sorry not familiar with freezable WQ semantics.
But after looking at code, this is what I understood :-
1. freezable Workqueues will be frozen before the system suspend
   callbacks are invoked for the devices.
2. Any work item queued after the WQ is marked frozen will be scheduled
   later, on resume.
3. But if a work item was already present in the freezable Workqueue,
   before it was frozen and it did not complete, then system suspend
   itself will be aborted.
4. So if the log.flush_wq is marked as freezable, then flush of
   work item will not be required for the system suspend case.
   And runtime suspend case is already covered with rpm get/put
   around register access in work item function.

It seems there are 2 config options CONFIG_SUSPEND_FREEZER and
CONFIG_FREEZER which have to be enabled for all the above to happen.
If these config options will always be enabled then probably marking
log.flush_wq would work.

Please kindly confirm whether I understood correctly or not, accordingly 
will proceed further.

Best regards
Akash



--Imre

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx