On 10/31/20 1:13 AM, Dmitry Osipenko wrote:
28.10.2020 12:54, Mikko Perttunen пишет:
On 10/27/20 9:06 PM, Dmitry Osipenko wrote:
26.10.2020 12:11, Mikko Perttunen пишет:
The first patches should be the ones that are relevant to the existing
userspace code, like support for the waits.
Can you elaborate what you mean by this?
All features that don't have an immediate real use-case should be placed
later in the series because we may defer merging of those patches until
we will see userspace that uses those features since we can't really
decide whether these are decisions that we won't regret later on, only
practical application can confirm the correctness.
I was more referring to the "support for waits" part, should have
clarified that.
The "support for waits" is support for the WAIT_SYNCPT command exposed
to userspace, which we could utilize right now.
Partial mappings should be a separate feature because it's a
questionable feature that needs to be proved by a real userspace first.
Maybe it would be even better to drop it for the starter, to ease
reviewing.
Considering that the "no-op" support for it (map the whole buffer but
just keep track of the starting offset) is only a couple of lines, I'd
like to keep it in.
There is no tracking in the current code which prevents the duplicated
mappings, will we need to care about it? This a bit too questionable
feature for now, IMO. I'd like to see it as a separate patch.
I don't think there is any need to special case duplicated mappings. I
think this is a pretty obvious feature to have, but I can rename them to
reserved0 and reserved1 and require that reserved0 is zero and reserved1
is the size of the passed GEM object.
I'm now concerned about the reserved fields after seeing this reply from
Daniel Vetter:
https://www.mail-archive.com/nouveau@xxxxxxxxxxxxxxxxxxxxx/msg36324.html
If DRM IOCTL structs are zero-extended, then perhaps we won't need to
reserve anything?
I guess for the channel_map we can drop the offset/length, I just think
it's fairly obvious that an IOMMU mapping API lets you specify from
where and how much you want to map. Sure, it's not a functionality
blocker as it can simply be implemented in userspace by shifting the
reloc offset / IOVA equivalently, but it will reduce IO address space
usage and prevent access to memory that was not intended to be mapped to
the engine. The latter becomes a major PITA if you need to create safety
documentation at this level -- don't know if this is relevant on Linux
or not..
...
I'd like to see the DRM_SCHED and syncobj support. I can help you with
it if it's out of yours scope for now.
I already wrote some code for syncobj I can probably pull in. Regarding
DRM_SCHED, help is accepted. However, we should keep using the hardware
scheduler, and considering it's a bigger piece of work, let's not block
this series on it.
I'd like to see all the custom IOCTLs to be deprecated and replaced with
the generic DRM API wherever possible. Hence, I think it should be a
mandatory features that we need to focus on. The current WIP userspace
already uses and relies on DRM_SCHED.
From my point of view, the ABI needs to be designed such that it can
replace what we have downstream, i.e. it needs to support the usecases
the downstream nvhost ABI supports currently. Otherwise there is no
migration path to upstream and it's not worth it for me to work on this.
The downstream needs should be irrelevant for the upstream, please read
this:
https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#open-source-userspace-requirements
It may happen that some of the downstream features could become useful
for upstream, but we don't know until we will see the full userspace code.
We don't have a comprehensive userspace which could utilize all the new
features and that's why upstream driver has been stagnated for many
years now. The grate-drivers would greatly benefit from the updated ABI,
but I think that we need at least a usable mesa driver first, that's why
I haven't bothered to upstream anything from the WIP UAPI v2.
In order to upstream new UAPI features we will need:
1. Hardware specs (from vendor or reverse-engineered).
2. Regression tests.
3. A non-toy opensource userspace. >
Although, I don't see what this ABI is missing that your userspace would
rely on. Does it submit jobs in reverse order that would deadlock if
drm_sched didn't reorder them based on prefences, or something?
It's the opposite, we don't have userspace which needs majority of the
proposed ABI. This needs to be fixed before we could seriously consider
merging the new features.
I'm pretty sure that you was already aware about all the upstreaming
requirements and we will see the usable opensource userspace at some
point, correct?
I am well aware of that. I'm not saying that we should copy the
downstream stack. I am saying that when designing an ABI, we should
consider all information available on what kind of features would be
required from it.
Going through the proposed TegraDRM UAPI, there are some features that
would probably not be immediately used by userspace, or supported in a
non-NOOP fashion by the kernel:
* Map offset/length
* IOVA of mapping
* Creation of sync_file postfence
* Waiting for sync_file prefences
* SUBMIT_CONTEXT_SETUP flag
* Having two syncpt_incrs
* Reservations?
I suppose we can remove all of that for now, as long as we ensure that
there is a path to add them back. I'm just a bit concerned that we'll
end up with 10 different ABI revisions and userspace will have to do a
version detection dance and enable things depending on driver version.
Anything else to remove?
Regarding things like explicit channel_map, sure, we could map things
implicitly at submit time, but it is a huge performance improvement on
newer chips, at least. So technically userspace doesn't need it, but
going by that argument, we can get rid of a lot of kernel functionality
-- after all, it's only needed if you want your hardware to perform well.
For now it will be good to have this series in a form of a
work-in-progress patches, continuing to discuss and update it. Meanwhile
we will need to work on the userspace part as well, which is a much
bigger blocker.
I'm hoping that porting the userspace won't take that long. It shouldn't
be that big of a hurdle.
Software-based scheduling might make sense in situations where the
channel is shared by all processes, but at least for modern chips that
are designed to use hardware scheduling, I want to use that capability.
The software-based scheduling is indeed needed for the older SoCs in
order not to block hardware channels and job-submission code paths.
Maybe it could become useful for a newer SoCs as well once we will get
closer to a usable userspace :)
Considering that many products were successfully shipped without
software-based scheduling, I wouldn't consider it "needed".
It will be great to have the hardware-based scheduling supported, but I
assume that it needs to be done on top of DRM_SCHED. This should allow
us to remove all the buggy host1x's pushbuffer locking code (which is
known to deadlock) and support a non-host1x fences for free.
If it is known to deadlock, we should fix it. In any case, which kind of
scheduler is used shouldn't affect the ABI, and we already have a
functional implemention in the Host1x driver, so we should merge the ABI
first rather than wait for another year while the rest of the driver is
redesigned and rewritten.
Mikko