10.06.2021 14:04, Mikko Perttunen пишет: > Add a new property for jobs to enable or disable recovery i.e. > CPU increments of syncpoints to max value on job timeout. This > allows for a more solid model for hanged jobs, where userspace > doesn't need to guess if a syncpoint increment happened because > the job completed, or because job timeout was triggered. Userspace should always get proper timeout. Threshold should be wrapped into fence. Fence's error state should be set to -ETIMEDOUT. > On job timeout, we stop the channel, NOP all future jobs on the > channel using the same syncpoint, mark the syncpoint as locked > and resume the channel from the next job, if any. > > The future jobs are NOPed, since because we don't do the CPU > increments, the value of the syncpoint is no longer synchronized, > and any waiters would become confused if a future job incremented > the syncpoint. The syncpoint is marked locked to ensure that any > future jobs cannot increment the syncpoint either, until the > application has recognized the situation and reallocated the > syncpoint. It should be much easier to switch to DRM scheduler, removing lot's of the old code instead of updating it with new quirks that are difficult to follow.