On Tue, Jun 23, 2020 at 3:03 PM Mikko Perttunen <cyndis@xxxxxxxx> wrote: > > # Host1x/TegraDRM UAPI proposal > > This is a proposal for a stable UAPI for Host1x and TegraDRM, to replace > the current TegraDRM UAPI that is behind `STAGING` and quite obsolete in > many ways. > > I haven't written any implementation yet -- I'll do that once there is > some agreement on the high-level design. > > Current open items: > > * The syncpoint UAPI allows userspace to create sync_file FDs with > arbitrary syncpoint fences. dma_fence code currently seems to assume all > fences will be signaled, which would not necessarily be the case with > this interface. > * Previously present GEM IOCTLs (GEM_CREATE, GEM_MMAP) are not present. > Not sure if they are still needed. > Hi, as this wasn't addressed here (and sorry if I missed it): is there an open source userspace making use of this UAPI? Because this is something which needs to be seen before it can be included at all. > ## Introduction to the hardware > > Tegra Host1x is a hardware block providing the following capabilities: > > * Syncpoints, a unified whole-system synchronization primitive, allowing > synchronization of work between graphics, compute and multimedia > engines, CPUs including cross-VM synchronization, and devices on the > PCIe bus, without incurring CPU overhead. > * Channels, a command DMA mechanism allowing asynchronous programming of > various engines, integrating with syncpoints. > * Hardware virtualization support for syncpoints and channels. (On > Tegra186 and newer) > > This proposal defines APIs for userspace access to syncpoints and > channels. Kernel drivers can additionally use syncpoints and channels > internally, providing other userspace interfaces (e.g. V4L2). > > Syncpoint and channel interfaces are split into separate parts, as > syncpoints are useful as a system synchronization primitive even without > using the engine drivers provided through TegraDRM. For example, a > computer vision pipeline consisting of video capture, CPU processing and > GPU processing would not necessarily use the engines provided by > TegraDRM. See the Example workflows section for more details. > > ## Syncpoint interface > > Syncpoints are a set of 32-bit values providing the following operations: > > * Atomically increment value by one > * Read current value > * Wait until value reaches specified threshold. For waiting, the 32-bit > value space is treated modulo 2^32; e.g. if the current value is > 0xffffffff, then value 0x0 is considered to be one increment in the future. > > Each syncpoint is identified by a system-global ID ranging between [0, > number of syncpoints supported by hardware). The entire system has > read-only access to all syncpoints based on their ID. > > Syncpoints are managed through the device node /dev/host1x provided by > the gpu/host1x driver. > > ### IOCTL HOST1X_ALLOCATE_SYNCPOINT (on /dev/host1x) > > Allocates a free syncpoint, returning a file descriptor representing it. > Only the owner of the file descriptor is allowed to mutate the value of > the syncpoint. > > ``` > struct host1x_ctrl_allocate_syncpoint { > /** > * @fd: > * > * [out] New file descriptor representing the allocated syncpoint. > */ > __s32 fd; > > __u32 reserved[3]; > }; > ``` > > ### IOCTL HOST1X_SYNCPOINT_INFO (on syncpoint file descriptor) > > Allows retrieval of system-global syncpoint ID corresponding to syncpoint. > > Use cases: > > * Passing ID to other system components that identify syncpoints by ID > * Debugging and testing > > ``` > struct host1x_syncpoint_info { > /** > * @id: > * > * [out] System-global ID of the syncpoint. > */ > __u32 id; > > __u32 reserved[3]; > }; > ``` > > ### IOCTL HOST1X_SYNCPOINT_INCREMENT (on syncpoint file descriptor) > > Allows incrementing of the syncpoint value. > > Use cases: > > * Signalling work completion when executing a pipeline step on the CPU > * Debugging and testing > > ``` > struct host1x_syncpoint_increment { > /** > * @count: > * > * [in] Number of times to increment syncpoint. Value can be > * observed in in-between values, but increments are atomic. > */ > __u32 count; > }; > ``` > > ### IOCTL HOST1X_READ_SYNCPOINT (on /dev/host1x) > > Read the value of a syncpoint based on its ID. > > Use cases: > > * Allows more fine-grained tracking of task progression for debugging > purposes > > ``` > struct host1x_ctrl_read_syncpoint { > /** > * @id: > * > * [in] ID of syncpoint to read. > */ > __u32 id; > > /** > * @value: > * > * [out] Value of the syncpoint. > */ > __u32 value; > }; > ``` > > ### IOCTL HOST1X_CREATE_FENCE (on /dev/host1x) > > Creates a new SYNC_FILE fence file descriptor for the specified > syncpoint ID and threshold. > > Use cases: > > * Creating a fence when receiving ID/threshold pair from another system > component > * Creating a postfence when executing a pipeline step on the CPU > * Creating a postfence when executing a pipeline step controlled by > userspace (e.g. GPU userspace submission) > > ``` > struct host1x_ctrl_create_fence { > /** > * @id: > * > * [in] ID of the syncpoint for which to create a fence. > */ > __u32 id; > > /** > * @threshold: > * > * [in] Threshold value for fence. > */ > __u32 threshold; > > /** > * @fence_fd: > * > * [out] New sync_file FD corresponding to the ID and threshold. > */ > __s32 fence_fd; > > __u32 reserved[1]; > }; > ``` > > ### IOCTL HOST1X_GET_FENCE_INFO (on /dev/host1x) > > Allows retrieval of the ID/threshold pairs corresponding to a SYNC_FILE > fence or fence array. > > Use cases: > > * Debugging and testing > * Transmitting a fence to another system component requiring ID/threshold > * Getting ID/threshold for prefence when programming a pipeline step > controlled by userspace (e.g. GPU userspace submission) > > ``` > /* If set, corresponding fence is backed by Host1x syncpoints. */ > #define HOST1X_CTRL_FENCE_INFO_SYNCPOINT_FENCE (1 << 0) > > struct host1x_ctrl_fence_info { > /** > * @flags: > * > * [out] HOST1X_CTRL_FENCE_INFO flags. > */ > __u32 flags; > > /** > * @id: > * > * [out] ID of the syncpoint corresponding to this fence. > * Only set if HOST1X_CTRL_FENCE_INFO_SYNCPOINT_FENCE is set. > */ > __u32 id; > > /** > * @threshold: > * > * [out] Signalling threshold of the fence. > * Only set if HOST1X_CTRL_FENCE_INFO_SYNCPOINT_FENCE is set. > */ > __u32 threshold; > > __u32 reserved[1]; > }; > > struct host1x_ctrl_get_fence_info { > /** > * @fence_fd: > * > * [in] Syncpoint-backed sync_file FD for which to retrieve info. > */ > __s32 fence_fd; > > /** > * @num_fences: > * > * [in] Size of `fence_info` array in elements. > * [out] Number of fences held by the FD. > */ > __u32 num_fences; > > /** > * @fence_info: > * > * [in] Pointer to array of 'struct host1x_ctrl_fence_info' > where info will be stored. > */ > __u64 fence_info; > > __u32 reserved[1]; > }; > ``` > > ## Channel interface > > ### DRM_TEGRA_OPEN_CHANNEL > > ``` > struct drm_tegra_open_channel { > /** > * @class: > * > * [in] Host1x class (engine) the channel will target. > */ > __u32 host1x_class; > > /** > * @flags: > * > * [in] Flags. Currently none are specified. > */ > __u32 flags; > > /** > * @channel_id: > * > * [out] Process-specific identifier corresponding to opened > * channel. Not the hardware channel ID. > */ > __u32 channel_id; > > /** > * @hardware_version: > * > * [out] Hardware version of the engine targeted by the channel. > * Userspace can use this to select appropriate programming > * sequences. > */ > __u32 hardware_version; > > /** > * @mode: > * > * [out] Mode the hardware is executing in. Some engines can be > * configured with different firmware supporting different > * functionality depending on the system configuration. This > * value allows userspace to detect if the engine is configured > * for the intended use case. > */ > __u32 mode; > > __u32 reserved[3]; > }; > ``` > > ### DRM_TEGRA_CLOSE_CHANNEL > > ``` > struct drm_tegra_close_channel { > /** > * @channel_id: > * > * [in] ID of channel to close. > */ > __u32 channel_id; > > __u32 reserved[3]; > }; > ``` > > ### DRM_TEGRA_CHANNEL_MAP > > Make memory accessible by the engine while executing work on the channel. > > ``` > #define DRM_TEGRA_CHANNEL_MAP_READWRITE (1<<0) > > struct drm_tegra_channel_map { > /* > * [in] ID of the channel for which to map memory to. > */ > __u32 channel_id; > > /* > * [in] GEM handle of the memory to map. > */ > __u32 handle; > > /* > * [in] Offset in GEM handle of the memory area to map. > * > * Must be aligned by 4K. > */ > __u64 offset; > > /* > * [in] Length of memory area to map in bytes. > * > * Must be aligned by 4K. > */ > __u64 length; > > /* > * [out] IOVA of mapped memory. Userspace can use this IOVA > * directly to refer to the memory to skip using relocations. > * Only available if hardware memory isolation is enabled. > * > * Will be set to 0xffff_ffff_ffff_ffff if unavailable. > */ > __u64 iova; > > /* > * [out] ID corresponding to the mapped memory to be used for > * relocations or unmapping. > */ > __u32 mapping_id; > > /* > * [in] Flags. > */ > __u32 flags; > > __u32 reserved[6]; > }; > ``` > > ### DRM_TEGRA_CHANNEL_UNMAP > > Unmap previously mapped memory. Userspace shall do this only after it > has determined the channel will no longer access the memory. > > ``` > struct drm_tegra_channel_unmap { > /* > * [in] ID of the mapping to remove. > */ > __u32 mapping_id; > > __u32 reserved[3]; > }; > ``` > > ### DRM_TEGRA_CHANNEL_SUBMIT > > Submit a job to the engine/class targeted by the channel. > > ``` > struct drm_tegra_submit_syncpt_incr { > /* > * [in] Syncpoint FD of the syncpoint that the job will > * increment. > */ > __s32 syncpt_fd; > > /* > * [in] Number of increments that the job will do. > */ > __u32 num_incrs; > > /* > * [out] Value the syncpoint will have once all increments have > * executed. > */ > __u32 fence_value; > > __u32 reserved[1]; > }; > > /* Sets paddr/IOVA bit 39 on T194 to enable MC swizzling */ > #define DRM_TEGRA_SUBMIT_RELOCATION_BLOCKLINEAR (1<<0) > > struct drm_tegra_submit_relocation { > /* [in] Index of GATHER or GATHER_UPTR command in commands. */ > __u32 gather_command_index; > > /* > * [in] Mapping handle (obtained through CHANNEL_MAP) of the memory > * the address of which will be patched in. > */ > __u32 mapping_id; > > /* > * [in] Offset in the gather that will be patched. > */ > __u64 gather_offset; > > /* > * [in] Offset in target buffer whose paddr/IOVA will be written > * to the gather. > */ > __u64 target_offset; > > /* > * [in] Number of bits the resulting address will be logically > * shifted right before writing to gather. > */ > __u32 shift; > > __u32 reserved[1]; > }; > > /* Command is an opcode gather from a GEM handle */ > #define DRM_TEGRA_SUBMIT_COMMAND_GATHER 0 > /* Command is an opcode gather from a user pointer */ > #define DRM_TEGRA_SUBMIT_COMMAND_GATHER_UPTR 1 > /* Command is a wait for syncpt fence completion */ > #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCPT 2 > /* Command is a wait for SYNC_FILE FD completion */ > #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNC_FILE 3 > /* Command is a wait for DRM syncobj completion */ > #define DRM_TEGRA_SUBMIT_COMMAND_WAIT_SYNCOBJ 4 > > /* > * Allow driver to skip command execution if engine > * was not accessed by another channel between > * submissions. > */ > #define DRM_TEGRA_SUBMIT_CONTEXT_SETUP (1<<0) > > struct drm_tegra_submit_command { > __u16 type; > __u16 flags; > > union { > struct { > /* GEM handle */ > __u32 handle; > > /* > * Offset into GEM object in bytes. > * Must be aligned by 4. > */ > __u64 offset; > > /* > * Length of gather in bytes. > * Must be aligned by 4. > */ > __u64 length; > } gather; > > struct { > __u32 reserved[1]; > > /* > * Pointer to gather data. > * Must be aligned by 4 bytes. > */ > __u64 base; > > /* > * Length of gather in bytes. > * Must be aligned by 4. > */ > __u64 length; > } gather_uptr; > > struct { > __u32 syncpt_id; > __u32 threshold; > > __u32 reserved[1]; > } wait_syncpt; > > struct { > __s32 fd; > } wait_sync_file; > > struct { > __u32 handle; > } wait_syncobj; > }; > }; > > > #define DRM_TEGRA_SUBMIT_CREATE_POST_SYNC_FILE (1<<0) > #define DRM_TEGRA_SUBMIT_CREATE_POST_SYNCOBJ (1<<1) > > struct drm_tegra_channel_submit { > __u32 channel_id; > __u32 flags; > > /** > * [in] Timeout in microseconds after which the kernel may > * consider the job to have hung and may reap it and > * fast-forward its syncpoint increments. > * > * The value may be capped by the kernel. > */ > __u32 timeout; > > __u32 num_syncpt_incrs; > __u32 num_relocations; > __u32 num_commands; > > __u64 syncpt_incrs; > __u64 relocations; > __u64 commands; > > /** > * [out] Invalid, SYNC_FILE FD or syncobj handle, depending on > * if DRM_TEGRA_SUBMIT_CREATE_POST_SYNC_FILE, > * DRM_TEGRA_SUBMIT_CREATE_POST_SYNCOBJ, or neither are passed. > * Passing both is an error. > * > * The created fence object is signaled when all syncpoint > * increments specified in `syncpt_incrs' have executed. > */ > __u32 post_fence; > > __u32 reserved[3]; > }; > ``` > > ## Example workflows > > ### Image processing with TegraDRM/VIC > > This example is a simple single-step operation using VIC through > TegraDRM. For example, assume we have a dma-buf fd with an image we want > to convert from YUV to RGB. This can occur for example as part of video > decoding. > > Syncpoint initialization > > 1. Allocate syncpoint (HOST1X_ALLOCATE_SYNCPOINT) > 1. This is used to track VIC submission completion. > 2. Retrieve syncpoint ID (HOST1X_SYNCPOINT_INFO) > 1. The ID is required to program the increment as part of the > submission. > > Buffer allocation > > 3. Allocate memory for configuration buffers (DMA Heaps) > 4. Import configuration buffer dma-buf as GEM object > 5. Import input image dma-buf as GEM object > > Channel initialization > > 6. Open VIC channel (DRM_TEGRA_OPEN_CHANNEL) > 7. Map buffers for access by VIC (DRM_TEGRA_CHANNEL_MAP) > 8. Create Host1x opcode buffer as userspace memory > 1. If buffer mapping returned an IOVA, that IOVA can be placed > directly into the buffer. Otherwise, a relocation has to be passed as > part of the submission > 2. The buffer should contain a syncpoint increment for the syncpoint > allocated earlier. > 9. Submit work, passing in the syncpoint file descriptor allocated in > the beginning. The submit optionally returns a syncfd/syncobj that can > be used to wait for submission completion. > 1. If more fine-grained syncpoint waiting is required, the 'fence' > out-parameter of 'drm_tegra_submit_syncpt_incr' can be used in > conjunction with HOST1X_CREATE_FENCE to create specific fences. > > ### Camera-GPU-CPU pipeline without TegraDRM > > This example shows a pipeline with image input from a camera being > processed using the GPU programmed from userspace, and then finally > analyzed by CPU. This kind of usecase can occur e.g. as part of a > computer vision usecase. > > Syncpoint initialization > > 1. Camera V4L2 driver allocates a syncpoint internally within the kernel. > 2. For CPU job tracking, allocate a syncpoint as in "Image processing > with TegraDRM/VIC". > 3. For GPU job tracking, the GPU kernel driver would allocate a > syncpoint and assign it such that the GPU channel can access it. > > Camera pipeline step > > 4. Allocate a dma-buf to store the captured image. > 5. Trigger camera capture and store the resulting sync_file fd. > > GPU pipeline step > > 6. Use HOST1X_GET_FENCE_INFO to extract syncpoint ID/threshold pair(s) > from camera step's post-fence sync_file FD. If the sync_file FD is not > backed by syncpoints, wait for the sync_file FD to signal otherwise > (e.g. through polling it). > 7. Use HOST1X_CREATE_FENCE to create a postfence that is signaled when > the GPU step is complete. > 8. Program the GPU to > 1. Wait for the syncpoint thresholds extracted from the camera > postfence, if we were able to do so. > 2. Execute image processing on GPU. > 3. Increment GPU's job tracking syncpoint, causing the GPU > post-fence FD to get signaled. > > CPU pipeline step > > 9. Wait for GPU's post-fence sync_file FD > 10. Map the dma-buf containing the image and retrieve results. > > In place of the GPU pipeline step, a similar workflow would apply for a > step executed on the CPU. > > -- > > thanks, > Mikko > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel >