Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI

Dmitry Osipenko <digetx@xxxxxxxxx> · Wed, 20 Jan 2021 01:29:20 +0300

11.01.2021 15:59, Mikko Perttunen пишет:
> Hi all,
> 
> here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
> containing primarily small bug fixes. It has also been
> rebased on top of recent linux-next.
> 
> vaapi-tegra-driver has been updated to support the new UAPI
> as well as Tegra186:
> 
>   https://github.com/cyndis/vaapi-tegra-driver
> 
> The `putsurface` program has been tested to work.
> 
> The test suite for the new UAPI is available at
> https://github.com/cyndis/uapi-test
> 
> The series can be also found in
> https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.
> 
> Older versions:
> v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
> v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
> v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
> v4: https://www.spinics.net/lists/dri-devel/msg279897.html
> 
> Thank you,
> Mikko

The basic support for the v5 UAPI is added now to the Opentegra driver.
 In overall UAPI works, but there are couple things that we need to
improve, I'll focus on them here.

Problems
========

1. The channel map/unmap API needs some more thought.

The main problem is a difficulty to track liveness of BOs and mappings.
 The kernel driver refs BO for each mapping and userspace needs to track
both BO and its mappings together, it's too easy to make mistake and
leak BOs without noticing.

2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
remember we discussed this previously, but this pops up again and I
don't remember where we ended previously.

This creates unnecessary complexity for userspace.  Userspace needs to
go through a lot of chores just to get a sync point and then to manage
it for the jobs.

Nothing stops two different channels to reuse a single sync point and
use it for a job, fixing this will only add more complexity to the
kernel driver instead of removing it.

3. The signalling of DMA fences doesn't work properly in v5 apparently
because of the host1x waiter bug.  I see that sync point interrupt
happens, but waiter callback isn't invoked.

4. The sync_file API is not very suitable for DRM purposes because of
-EMFILE "Too many open files", which I saw while was running x11perf.
It also adds complexity to userspace, instead of removing it.  This
approach not suitable for DRM scheduler as well.

5. Sync points have a dirty hardware state when allocated / requested.
The special sync point reservation is meaningless in this case.

6. I found that the need to chop cmdstream into gathers is a bit
cumbersome for userspace of older SoCs which don't have h/w firewall.
Can we support option where all commands are collected into a single
dedicated cmdstream for a job?

Possible solutions for the above problems
=========================================

1. Stop to use concept of "channels". Switch to DRM context-only.

Each DRM context should get access to all engines once DRM context is
created.  Think of it like "when DRM context is opened, it opens a
channel for each engine".

Then each DRM context will get one instance of mapping per-engine for
each BO.

enum tegra_engine {
	TEGRA_GR2D,
	TEGRA_GR3D,
	TEGRA_VIC,
	...
	NUM_ENGINES
};

struct tegra_bo_mapping {
	dma_addr_t ioaddr;
	...
};

struct tegra_bo {
	...
	struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
};

Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
specified h/w engine.

Once BO is closed, all its mappings should be closed too.  This way
userspace doesn't need to track both BOs and mappings.

Everything that userspace needs to do is:

	1) Open DRM context
	2) Create GEM
	3) Map GEM to required hardware engines
	4) Submit job that uses GEM handle
	5) Close GEM

If GEM wasn't mapped prior to the job's submission, then job will fail
because kernel driver won't resolve the IO mapping of the GEM.

2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
increments.  The job's sync point will be allocated dynamically when job
is submitted.  We will need a fag for the sync_incr and wait_syncpt
commands, saying "it's a job's sync point increment/wait"

3. We should use dma-fence API directly and waiter-shim should be
removed.  It's great that you're already working on this.

4. Sync file shouldn't be needed for the part of DRM API which doesn't
interact with external non-DRM devices.  We should use DRM syncobj for
everything related to DRM, it's a superior API over sync file, it's
suitable for DRM scheduler.

5. The hardware state of sync points should be reset when sync point is
requested, not when host1x driver is initialized.

6. We will need to allocate a host1x BO for a job's cmdstream and add a
restart command to the end of the job's stream.  CDMA will jump into the
job's stream from push buffer.

We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
gather should be inlined into job's main cmdstream.

This will remove a need to have a large push buffer that will easily
overflow, it's a real problem and upstream driver even has a bug where
it locks up on overflow.

How it will look from CDMA perspective:

PUSHBUF |
---------
...     |      | JOB   |
        |      ---------       | JOB GATHER |
RESTART	------> CMD    |       --------------
        |      |GATHER -------> DATA        |
... <---------- RESTART|       |            |
        |      |       |

What's missing
==============

1. Explicit and implicit fencing isn't there yet, we need to support DRM
scheduler for that.

2. The "wait" command probably should be taught to take a syncobj handle
in order to populate it with a dma-fence by kernel driver once job is
submitted.  This will give us intermediate fences of a job and allow
utilize the syncobj features like "wait until job is submitted to h/w
before starting to wait for timeout", which will be needed by userspace
when DRM scheduler will be supported.

Miscellaneous
=============

1. Please don't forget to bump driver version.  This is important for
userspace.

2. Please use a proper kernel coding style, use checkpatch.

   # git format-patch -v5 ...
   # ./scripts/checkpatch.pl --strict v5*

3. Kernel driver needs a rich error messages for each error condition
and it should dump submitted job when firewall fails.  It's very tedious
to re-add it all each time when something doesn't work.

4. Previously firewall was using the client's class if is_valid_class
wasn't specified in tegra_drm_client_ops, you changed it and now
firewall fails for GR3D because it doesn't have the is_valid_class() set
in the driver.  See [1].

5. The CDMA class should be restored to the class of a previous gather
after the wait command in submit_gathers() and not to the class of the
client.  GR2D supports multiple classes.  See [1].

[1]
https://github.com/grate-driver/linux/commit/024cba369c9c0e2762e9890068ff9944cb10c44f