On Wed, 20 Dec 2023 20:59:43 +0100 Ketil Johnsen <ketil.johnsen@xxxxxxx> wrote: > > +/** > > + * cs_slot_sync_queue_state_locked() - Synchronize the queue slot priority > > + * @ptdev: Device. > > + * @csg_id: Group slot. > > + * @cs_id: Queue slot. > > + * > > + * Queue state is updated on group suspend or STATUS_UPDATE event. > > + */ > > +static void > > +cs_slot_sync_queue_state_locked(struct panthor_device *ptdev, u32 csg_id, u32 cs_id) > > +{ > > + struct panthor_group *group = ptdev->scheduler->csg_slots[csg_id].group; > > + struct panthor_queue *queue = group->queues[cs_id]; > > + struct panthor_fw_cs_iface *cs_iface = > > + panthor_fw_get_cs_iface(group->ptdev, csg_id, cs_id); > > + > > + u32 status_wait_cond; > > + > > + switch (cs_iface->output->status_blocked_reason) { > > + case CS_STATUS_BLOCKED_REASON_UNBLOCKED: > > + if (queue->iface.input->insert == queue->iface.output->extract && > > + cs_iface->output->status_scoreboards == 0) > > + group->idle_queues |= BIT(cs_id); > > + break; > > + > > + case CS_STATUS_BLOCKED_REASON_SYNC_WAIT: > > + drm_WARN_ON(&ptdev->base, !list_empty(&group->wait_node)); > > I think we should remove this drm_WARN_ON(). With my user submission > experiments, I keep hitting this warning because I'm a bit slow to > signal a Mali sync object. In other words; I'm keeping a stream blocked > for a while. > > It is quite common to get two rapid job IRQs, e.g. one for a global > event, and one for a particular CSG event. Depending on timing of the > scheduled work to deal with the IRQs, I quite often end up with two > tick_work() being scheduled and executed as a result of this. Both of > these will see the same stream as CS_STATUS_BLOCKED_REASON_UNBLOCKED, > and hence the second will trigger the drm_WARN_ON(), as the first run > already added the group to the waiting list. > > I'm pretty sure we can hit this drm_WARN_ON() when user space starts > making use of multiple streams pr group as well, since two or more > streams for the same group could both be > CS_STATUS_BLOCKED_REASON_SYNC_WAIT, thus running into the same issue. It makes total sense, I'll drop the WARN_ON(). > > > + list_move_tail(&group->wait_node, &group->ptdev->scheduler->groups.waiting); > > + group->blocked_queues |= BIT(cs_id); > > + queue->syncwait.gpu_va = cs_iface->output->status_wait_sync_ptr; > > + queue->syncwait.ref = cs_iface->output->status_wait_sync_value; > > + status_wait_cond = cs_iface->output->status_wait & CS_STATUS_WAIT_SYNC_COND_MASK; > > + queue->syncwait.gt = status_wait_cond == CS_STATUS_WAIT_SYNC_COND_GT; > > + if (cs_iface->output->status_wait & CS_STATUS_WAIT_SYNC_64B) { > > + u64 sync_val_hi = cs_iface->output->status_wait_sync_value_hi; > > + > > + queue->syncwait.sync64 = true; > > + queue->syncwait.ref |= sync_val_hi << 32; > > + } else { > > + queue->syncwait.sync64 = false; > > + } > > + break; > > + > > + default: > > + /* Other reasons are not blocking. Consider the queue as runnable > > + * in those cases. > > + */ > > + break; > > + } > > +}