On Thu, 5 Nov 2020 13:27:04 +0000 Steven Price <steven.price@xxxxxxx> wrote: > > + old_status = atomic_xchg(&queue->status, > > + PANFROST_QUEUE_STATUS_STOPPED); > > + WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE && > > + old_status != PANFROST_QUEUE_STATUS_STOPPED); > > + if (old_status == PANFROST_QUEUE_STATUS_STOPPED) > > + goto out; > > NIT: It's slightly cleaner if you swap the above lines, i.e.: > > if (old_status == PANFROST_QUEUE_STATUS_STOPPED) > goto out; > WARN_ON(old_status != PANFROST_QUEUE_STATUS_ACTIVE); I agree. > > > + > > + drm_sched_stop(&queue->sched, bad); > > + if (bad) > > + drm_sched_increase_karma(bad); > > + > > + stopped = true; > > + > > + /* > > + * Set the timeout to max so the timer doesn't get started > > + * when we return from the timeout handler (restored in > > + * panfrost_scheduler_start()). > > + */ > > + queue->sched.timeout = MAX_SCHEDULE_TIMEOUT; > > + > > +out: > > mutex_unlock(&queue->lock); > > > > return stopped; > > } > > > > +static void panfrost_scheduler_start(struct panfrost_queue_state *queue) > > +{ > > + enum panfrost_queue_status old_status; > > + > > + mutex_lock(&queue->lock); > > + old_status = atomic_xchg(&queue->status, > > + PANFROST_QUEUE_STATUS_STARTING); > > + if (WARN_ON(old_status != PANFROST_QUEUE_STATUS_STOPPED)) > > + goto out; > > The error handling isn't great here - in this case the queue status is > left in _STATUS_STARTING, which at best would lead to another WARN_ON > being hit, but also has the effect of ignoring job faults. Probably the > timeout would eventually get things back to normal. > > Obviously this situation will never occur™, but we can do better either > by continuing with the normal logic below, or even better replacing > atomic_xchg() with an atomic_cmpxchg() (so leave the status alone if not > _STOPPED). Both seem like better error recovery options to me. But keep > the WARN_ON because something has clearly gone wrong if this happens. The second approach doesn't unblock things if we end up with old_status != STOPPED and the queue is really stopped (which shouldn't happen, unless we have a problem in our state machine). I think I'll go for the first option and restart the queue unconditionally (I'm keeping the WARN_ON(), of course).