Re: [RFC] Host1x/TegraDRM UAPI (sync points)

Mikko Perttunen <cyndis@xxxxxxxx> · Tue, 30 Jun 2020 13:26:16 +0300

On 6/29/20 10:42 PM, Dmitry Osipenko wrote:

Secondly, I suppose neither GPU, nor DLA could wait on a host1x sync
point, correct? Or are they integrated with Host1x HW?

They can access syncpoints directly. (That's what I alluded to in the 
"Introduction to the hardware" section :) all those things have hardware 
access to syncpoints)

>
> .. rest ..
>

Let me try to summarize once more for my own understanding:

* When submitting a job, you would allocate new syncpoints for the job
* After submitting the job, those syncpoints are not usable anymore
* Postfences of that job would keep references to those syncpoints so 
they aren't freed and cleared before the fences have been released
* Once postfences have been released, syncpoints would be returned to 
the pool and reset to zero

The advantage of this would be that at any point in time, there would be 
a 1:1 correspondence between allocated syncpoints and jobs; so you could 
 shuffle the jobs around channels or reorder them.

Please correct if I got that wrong :)

---

I have two concerns:

* A lot of churn on syncpoints - any time you submit a job you might not 
get a syncpoint for an indefinite time. If we allocate syncpoints 
up-front at least you know beforehand, and then you have the syncpoint 
as long as you need it.
* Plumbing the dma-fence/sync_file everywhere, and keeping it alive 
until waits on it have completed, is more work than just having the 
ID/threshold. This is probably mainly a problem for downstream, where 
updating code for this would be difficult. I know that's not a proper 
argument but I hope we can reach something that works for both worlds.

Here's a proposal in between:

* Keep syncpoint allocation and submission in jobs as in my original 
proposal
* Don't attempt to recover user channel contexts. What this means:
  * If we have a hardware channel per context (MLOCKing), just tear 
down the channel
  * Otherwise, we can just remove (either by patching or by full 
teardown/resubmit of the channel) all jobs submitted by the user channel 
context that submitted the hanging job. Jobs of other contexts would be 
undisturbed (though potentially delayed, which could be taken into 
account and timeouts adjusted)
* If this happens, we can set removed jobs' post-fences to error status 
and user will have to resubmit them.
* We should be able to keep the syncpoint refcounting based on fences.

This can be made more fine-grained by not caring about the user channel 
context, but tearing down all jobs with the same syncpoint. I think the 
result would be that we can get either what you described (or how I 
understood it in the summary in the beginning of the message), or a more 
traditional syncpoint-per-userctx workflow, depending on how the 
userspace decides to allocate syncpoints.

If needed, the kernel can still do e.g. reordering (you mentioned job 
priorities) at syncpoint granularity, which, if the userspace followed 
the model you described, would be the same thing as job granularity.

(Maybe it would be more difficult with current drm_scheduler, sorry, 
haven't had the time yet to read up on that. Dealing with clearing work 
stuff up before summer vacation)

Mikko