Quoting Chris Wilson (2019-11-16 17:51:29) > The general concept was that intel_timeline.active_count was locked by > the intel_timeline.mutex. The exception was for power management, where > the engine->kernel_context->timeline could be manipulated under the > global wakeref.mutex. > > This was quite solid, as we always manipulated the timeline only while > we held an engine wakeref. > > And then we started retiring requests outside of struct_mutex, only > using the timelines.active_list and the timeline->mutex. There we > started manipulating intel_timeline.active_count outside of an engine > wakeref, and so introduced a race between __engine_park() and > intel_gt_retire_requests(), a race that could result in the > engine->kernel_context not being added to the active timelines and so > losing requests, which caused us to keep the system permanently powered > up [and unloadable]. > > The race would be easy to close if we could take the engine wakeref for > the timeline before we retire -- except timelines are not bound to any > engine and so we would need to keep all active engines awake. The > alternative is to guard intel_timeline_enter/intel_timeline_exit for use > outside of the timeline->mutex. So the idea of taking a wakeref on all active engines during retire is growing on me. That's the less frequent path and so avoids adding another set of atomic manipulations to every request. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx