Hi Christian, On Tue, 29 Jun 2021 13:03:58 +0200 Christian König <christian.koenig@xxxxxxx> wrote: > Am 29.06.21 um 09:34 schrieb Boris Brezillon: > > Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU > > reset. This leads to extra complexity when we need to synchronize timeout > > works with the reset work. One solution to address that is to have an > > ordered workqueue at the driver level that will be used by the different > > schedulers to queue their timeout work. Thanks to the serialization > > provided by the ordered workqueue we are guaranteed that timeout > > handlers are executed sequentially, and can thus easily reset the GPU > > from the timeout handler without extra synchronization. > > Well, we had already tried this and it didn't worked the way it is expected. > > The major problem is that you not only want to serialize the queue, but > rather have a single reset for all queues. > > Otherwise you schedule multiple resets for each hardware queue. E.g. for > your 3 hardware queues you would reset the GPU 3 times if all of them > time out at the same time (which is rather likely). > > Using a single delayed work item doesn't work either because you then > only have one timeout. > > What could be done is to cancel all delayed work items from all stopped > schedulers. drm_sched_stop() does that already, and since we call drm_sched_stop() on all queues in the timeout handler, we end up with only one global reset happening even if several queues report a timeout at the same time. Regards, Boris