11.01.2021 15:59, Mikko Perttunen пишет: > Hi all, > > here's the fifth revision of the Host1x/TegraDRM UAPI proposal, > containing primarily small bug fixes. It has also been > rebased on top of recent linux-next. > > vaapi-tegra-driver has been updated to support the new UAPI > as well as Tegra186: > > https://github.com/cyndis/vaapi-tegra-driver > > The `putsurface` program has been tested to work. > > The test suite for the new UAPI is available at > https://github.com/cyndis/uapi-test > > The series can be also found in > https://github.com/cyndis/linux/commits/work/host1x-uapi-v5. > > Older versions: > v1: https://www.spinics.net/lists/linux-tegra/msg51000.html > v2: https://www.spinics.net/lists/linux-tegra/msg53061.html > v3: https://www.spinics.net/lists/linux-tegra/msg54370.html > v4: https://www.spinics.net/lists/dri-devel/msg279897.html > > Thank you, > Mikko The basic support for the v5 UAPI is added now to the Opentegra driver. In overall UAPI works, but there are couple things that we need to improve, I'll focus on them here. Problems ======== 1. The channel map/unmap API needs some more thought. The main problem is a difficulty to track liveness of BOs and mappings. The kernel driver refs BO for each mapping and userspace needs to track both BO and its mappings together, it's too easy to make mistake and leak BOs without noticing. 2. Host1x sync point UAPI should not be used for tracking DRM jobs. I remember we discussed this previously, but this pops up again and I don't remember where we ended previously. This creates unnecessary complexity for userspace. Userspace needs to go through a lot of chores just to get a sync point and then to manage it for the jobs. Nothing stops two different channels to reuse a single sync point and use it for a job, fixing this will only add more complexity to the kernel driver instead of removing it. 3. The signalling of DMA fences doesn't work properly in v5 apparently because of the host1x waiter bug. I see that sync point interrupt happens, but waiter callback isn't invoked. 4. The sync_file API is not very suitable for DRM purposes because of -EMFILE "Too many open files", which I saw while was running x11perf. It also adds complexity to userspace, instead of removing it. This approach not suitable for DRM scheduler as well. 5. Sync points have a dirty hardware state when allocated / requested. The special sync point reservation is meaningless in this case. 6. I found that the need to chop cmdstream into gathers is a bit cumbersome for userspace of older SoCs which don't have h/w firewall. Can we support option where all commands are collected into a single dedicated cmdstream for a job? Possible solutions for the above problems ========================================= 1. Stop to use concept of "channels". Switch to DRM context-only. Each DRM context should get access to all engines once DRM context is created. Think of it like "when DRM context is opened, it opens a channel for each engine". Then each DRM context will get one instance of mapping per-engine for each BO. enum tegra_engine { TEGRA_GR2D, TEGRA_GR3D, TEGRA_VIC, ... NUM_ENGINES }; struct tegra_bo_mapping { dma_addr_t ioaddr; ... }; struct tegra_bo { ... struct tegra_bo_mapping *hw_maps[NUM_ENGINES]; }; Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a specified h/w engine. Once BO is closed, all its mappings should be closed too. This way userspace doesn't need to track both BOs and mappings. Everything that userspace needs to do is: 1) Open DRM context 2) Create GEM 3) Map GEM to required hardware engines 4) Submit job that uses GEM handle 5) Close GEM If GEM wasn't mapped prior to the job's submission, then job will fail because kernel driver won't resolve the IO mapping of the GEM. 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point increments. The job's sync point will be allocated dynamically when job is submitted. We will need a fag for the sync_incr and wait_syncpt commands, saying "it's a job's sync point increment/wait" 3. We should use dma-fence API directly and waiter-shim should be removed. It's great that you're already working on this. 4. Sync file shouldn't be needed for the part of DRM API which doesn't interact with external non-DRM devices. We should use DRM syncobj for everything related to DRM, it's a superior API over sync file, it's suitable for DRM scheduler. 5. The hardware state of sync points should be reset when sync point is requested, not when host1x driver is initialized. 6. We will need to allocate a host1x BO for a job's cmdstream and add a restart command to the end of the job's stream. CDMA will jump into the job's stream from push buffer. We could add a flag for that to drm_tegra_submit_cmd_gather, saying that gather should be inlined into job's main cmdstream. This will remove a need to have a large push buffer that will easily overflow, it's a real problem and upstream driver even has a bug where it locks up on overflow. How it will look from CDMA perspective: PUSHBUF | --------- ... | | JOB | | --------- | JOB GATHER | RESTART ------> CMD | -------------- | |GATHER -------> DATA | ... <---------- RESTART| | | | | | What's missing ============== 1. Explicit and implicit fencing isn't there yet, we need to support DRM scheduler for that. 2. The "wait" command probably should be taught to take a syncobj handle in order to populate it with a dma-fence by kernel driver once job is submitted. This will give us intermediate fences of a job and allow utilize the syncobj features like "wait until job is submitted to h/w before starting to wait for timeout", which will be needed by userspace when DRM scheduler will be supported. Miscellaneous ============= 1. Please don't forget to bump driver version. This is important for userspace. 2. Please use a proper kernel coding style, use checkpatch. # git format-patch -v5 ... # ./scripts/checkpatch.pl --strict v5* 3. Kernel driver needs a rich error messages for each error condition and it should dump submitted job when firewall fails. It's very tedious to re-add it all each time when something doesn't work. 4. Previously firewall was using the client's class if is_valid_class wasn't specified in tegra_drm_client_ops, you changed it and now firewall fails for GR3D because it doesn't have the is_valid_class() set in the driver. See [1]. 5. The CDMA class should be restored to the class of a previous gather after the wait command in submit_gathers() and not to the class of the client. GR2D supports multiple classes. See [1]. [1] https://github.com/grate-driver/linux/commit/024cba369c9c0e2762e9890068ff9944cb10c44f