12.09.2020 16:31, Mikko Perttunen пишет: ... >> I'm now taking a closer look at this patch and it raises some more >> questions, like for example by looking at the "On job timeout, we stop >> the channel, NOP all future jobs on the channel using the same syncpoint >> ..." through the prism of grate-kernel experience, I'm not sure how it >> could co-exist with the drm-scheduler and why it's needed at all. But I >> think we need a feature-complete version (at least a rough version), so >> that we could start the testing, and then it should be easier to review >> and discuss such things. >> > > The reason this is needed is that if a job times out and we don't do its > syncpoint increments on the CPU, then a successive job incrementing that > same syncpoint would cause fences to signal incorrectly. The job that > was supposed to signal those fences didn't actually run; and any data > those fences were protecting would still be garbage. I'll need to re-read the previous discussion because IIRC, I was suggesting that once job is hung, all jobs should be removed from queue/PB and re-submitted, then the re-submitted jobs will use the new/updated sync point base. And we probably should need another drm_tegra_submit_cmd type that waits for a relative sync point increment. The drm_tegra_submit_cmd_wait_syncpt uses absolute sync point value and it shouldn't be used for sync point increments that are internal to a job because it complicates the recovery. All waits that are internal to a job should only wait for relative sync point increments. In the grate-kernel every job uses unique-and-clean sync point (which is also internal to the kernel driver) and a relative wait [1] is used for the job's internal sync point increments [2][3][4], and thus, kernel driver simply jumps over a hung job by updating DMAGET to point at the start of a next job. [1] https://github.com/grate-driver/linux/blob/master/drivers/gpu/drm/tegra/uapi/patching.c#L367 [2] https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/gr3d.c#L486 [3] https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/exa/copy_2d.c#L389 [4] https://github.com/grate-driver/xf86-video-opentegra/blob/master/src/gpu/tegra_stream_v2.c#L536