Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs

Dmitry Osipenko <digetx@xxxxxxxxx> · Tue, 23 Mar 2021 23:13:35 +0300

23.03.2021 22:57, Dmitry Osipenko пишет:
> 23.03.2021 21:21, Thierry Reding пишет:
>> On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
>>> 03.02.2021 14:18, Mikko Perttunen пишет:
>>> ...
>>>>> I'll need more time to think about it.
>>>>>
>>>>
>>>> How about something like this:
>>>>
>>>> Turn the syncpt_incr field back into an array of structs like
>>>>
>>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>>>
>>>> struct drm_tegra_submit_syncpt_incr {
>>>>     /* can be left as zero if using dynamic syncpt */
>>>>     __u32 syncpt_id;
>>>>     __u32 flags;
>>>>
>>>>     struct {
>>>>         __u32 syncobj;
>>>>         __u32 value;
>>>>     } fence;
>>>>
>>>>     /* patch word as such:
>>>>          * *word = *word | (syncpt_id << shift)
>>>>          */
>>>>     struct {
>>>>         __u32 gather_offset_words;
>>>>         __u32 shift;
>>>>     } patch;
>>>> };
>>>>
>>>> So this will work similarly to the buffer reloc system; the kernel
>>>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>>>> requested, and allows outputting syncobjs for each increment.
>>>
>>> I haven't got any great ideas so far, but it feels that will be easier
>>> and cleaner if we could have separate job paths (and job IOCTLS) based
>>> on hardware generation since the workloads a too different. The needs of
>>> a newer h/w are too obscure for me and absence of userspace code,
>>> firmware sources and full h/w documentation do not help.
>>>
>>> There still should be quite a lot to share, but things like
>>> mapping-to-channel and VM sync points are too far away from older h/w,
>>> IMO. This means that code path before drm-sched and path for job-timeout
>>> handling should be separate.
>>>
>>> Maybe later on it will become cleaner that we actually could unify it
>>> all nicely, but for now it doesn't look like a good idea to me.
>>
>> Sorry for jumping in rather randomly here and elsewhere, but it's been a
>> long time since the discussion and I just want to share my thoughts on a
>> couple of topics in order to hopefully help move this forward somehow.
>>
>> For UAPI, "unifying it later" doesn't really work.
> 
> Of course I meant a "later version of this series" :) Sorry for not
> making it clear.
> 
>> So I think the only
>> realistic option is to make a best attempt at getting the UABI right so
>> that it works for all existing use-cases and ideally perhaps even as of
>> yet unknown use-cases in the future. As with all APIs this means that
>> there's going to be a need to abstract away some of the hardware details
>> so that we don't have to deal with too many low-level details in
>> userspace, because otherwise the UAPI is just going to be a useless
>> mess.
>>
>> I think a proposal such as the above to allow both implicit and explicit
>> syncpoints makes sense. For what it's worth, it's fairly similar to what
>> we had come up with last time we tried destaging the ABI, although back
>> at the time I'm not sure we had even considered explicit syncpoint usage
>> yet. I think where reasonably possible this kind of optional behaviour
>> is acceptable, but I don't think having two completely separate paths is
>> going to help in any way. If anything it's just going to make it more
>> difficult to maintain the code and get it to a usable state in the first
>> place.
>>
>> Like I said elsewhere, the programming model for host1x hasn't changed
>> since Tegra20. It's rather evolved and gained a couple more features,
>> but that doesn't change anything about how userspace uses it.
> 
> Not having a complete control over sync points state is a radical
> change, IMO. I prefer not to use this legacy and error-prone way of sync
> points handling at least for older h/w where it's possible to do. This
> is what downstream driver did 10 years ago and what it still continues
> to do. I was very happy to move away from this unnecessary complication
> in the experimental grate-kernel driver and I think this will be great
> to do in the mainline as well.
> 
> The need to map buffers explicitly is also a big difference. The need to
> map BO for each channel is a quite big over-complication as we already
> found out in the current version of UAPI.
> 
> Alright, perhaps the mapping could improved for older userspace if we
> will move away from a per-channel contexts to a single DRM context like
> I already was suggesting to Mikko before. I.e. instead of mapping BO "to
> a channel", we will need to map BO "to h/w clients" within the DRM
> context. This should allow older userspace to create a single mapping
> for all channels/clients using a single IOCTL and then to have a single
> "mapping handle" to care about. Objections?
> 

I just recalled that Mikko didn't like *not* to have the per-channel
contexts, saying that it should be good to have for later SoCs.

We could have a DRM context and channel contexts within the DRM context.
Then the mapping will need to be done to the channel contexts of the DRM
context, this should allow us to retain the per-channel contexts and
have a single mapping for older userspace.