Re: [PATCH v4 1/8] accel/qaic: Add documentation for AIC100 accelerator driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/21/2023 7:31 AM, Bagas Sanjaya wrote:
On Mon, Mar 20, 2023 at 09:11:07AM -0600, Jeffrey Hugo wrote:
+AIC100 defines a number of MHI channels for different purposes. This is a list
+of the defined channels, and their uses.
+
+| QAIC_LOOPBACK
+| Channels 0 & 1
+| Valid for AMSS
+| Any data sent to the device on this channel is sent back to the host.
+
+| QAIC_SAHARA
+| Channels 2 & 3
+| Valid for SBL
+| Used by SBL to obtain the runtime firmware from the host.
+
+| QAIC_DIAG
+| Channels 4 & 5
+| Valid for AMSS
+| Used to communicate with QSM via the Diag protocol.
+
+| QAIC_SSR
+| Channels 6 & 7
+| Valid for AMSS
+| Used to notify the host of subsystem restart events, and to offload SSR crashdumps.
+
+| QAIC_QDSS
+| Channels 8 & 9
+| Valid for AMSS
+| Used for the Qualcomm Debug Subsystem.
+
+| QAIC_CONTROL
+| Channels 10 & 11
+| Valid for AMSS
+| Used for the Neural Network Control (NNC) protocol.  This is the primary channel between host and
+  QSM for managing workloads.
+
+| QAIC_LOGGING
+| Channels 12 & 13
+| Valid for SBL
+| Used by the SBL to send the bootlog to the host.
+
+| QAIC_STATUS
+| Channels 14 & 15
+| Valid for AMSS
+| Used to notify the host of Reliability, Accessibility, Serviceability (RAS) events.
+
+| QAIC_TELEMETRY
+| Channels 16 & 17
+| Valid for AMSS
+| Used to get/set power/thermal/etc attributes.
+
+| QAIC_DEBUG
+| Channels 18 & 19
+| Valid for AMSS
+| Not used.
+
+| QAIC_TIMESYNC
+| Channels 20 & 21
+| Valid for SBL/AMSS
+| Used to synchronize timestamps in the device side logs with the host time source.

For the tabular data above, use table syntax instead.

Done.

+A request FIFO element has the following structure:
+
+| {
+|	u16 req_id;
+|	u8  seq_id;
+|	u8  pcie_dma_cmd;
+|	u32 reserved;
+|	u64 pcie_dma_source_addr;
+|	u64 pcie_dma_dest_addr;
+|	u32 pcie_dma_len;
+|	u32 reserved;
+|	u64 doorbell_addr;
+|	u8  doorbell_attr;
+|	u8  reserved;
+|	u16 reserved;
+|	u32 doorbell_data;
+|	u32 sem_cmd0;
+|	u32 sem_cmd1;
+|	u32 sem_cmd2;
+|	u32 sem_cmd3;
+| }

IMO code blocks should better fit the struct listing above.

Done

+
+Request field descriptions:
+
+| req_id- request ID. A request FIFO element and a response FIFO element with
+|         the same request ID refer to the same command.
+
+| seq_id- sequence ID within a request. Ignored by the DMA Bridge.
+
+| pcie_dma_cmd- describes the DMA element of this request.
+| 	Bit(7) is the force msi flag, which overrides the DMA Bridge MSI logic
+| 		and generates a MSI when this request is complete, and QSM
+| 		configures the DMA Bridge to look at this bit.
+| 	Bits(6:5) are reserved.
+| 	Bit(4) is the completion code flag, and indicates that the DMA Bridge
+| 		shall generate a response FIFO element when this request is
+| 		complete.
+| 	Bit(3) indicates if this request is a linked list transfer(0) or a bulk
+| 		transfer(1).
+| 	Bit(2) is reserved.
+| 	Bits(1:0) indicate the type of transfer. No transfer(0), to device(1),
+| 		from device(2). Value 3 is illegal.
+
+| pcie_dma_source_addr- source address for a bulk transfer, or the address of
+|         the linked list.
+
+| pcie_dma_dest_addr- destination address for a bulk transfer.
+
+| pcie_dma_len- length of the bulk transfer. Note that the size of this field
+| 	limits transfers to 4G in size.
+
+| doorbell_addr- address of the doorbell to ring when this request is complete.
+
+| doorbell_attr- doorbell attributes.
+| 	Bit(7) indicates if a write to a doorbell is to occur.
+| 	Bits(6:2) are reserved.
+| 	Bits(1:0) contain the encoding of the doorbell length. 0 is 32-bit,
+| 		1 is 16-bit, 2 is 8-bit, 3 is reserved. The doorbell address
+| 		must be naturally aligned to the specified length.
+
+| doorbell_data- data to write to the doorbell. Only the bits corresponding to
+| 	the doorbell length are valid.
+
+| sem_cmdN- semaphore command.
+| 	Bit(31) indicates this semaphore command is enabled.
+| 	Bit(30) is the to-device DMA fence. Block this request until all
+| 		to-device DMA transfers are complete.
+| 	Bit(29) is the from-device DMA fence. Block this request until all
+| 		from-device DMA transfers are complete.
+| 	Bits(28:27) are reserved.
+| 	Bits(26:24) are the semaphore command. 0 is NOP. 1 is init with the
+| 		specified value. 2 is increment. 3 is decrement. 4 is wait
+| 		until the semaphore is equal to the specified value. 5 is wait
+| 		until the semaphore is greater or equal to the specified value.
+| 		6 is "P", wait until semaphore is greater than 0, then
+| 		decrement by 1. 7 is reserved.
+| 	Bit(23) is reserved.
+| 	Bit(22) is the semaphore sync. 0 is post sync, which means that the
+| 		semaphore operation is done after the DMA transfer. 1 is
+| 		presync, which gates the DMA transfer. Only one presync is
+| 		allowed per request.
+| 	Bit(21) is reserved.
+| 	Bits(20:16) is the index of the semaphore to operate on.
+| 	Bits(15:12) are reserved.
+| 	Bits(11:0) are the semaphore value to use in operations.

What about bullet lists?

It doesn't feel like the above fits into a bullet list structure. Do you have an example reference, or can you transform a small section of the above to illustrate what you are thinking?

+
+| {
+| 	u16 req_id;
+| 	u16 completion_code;
+| }

Again, use code blocks.

Done

+passthrough- Allows userspace to send an opaque payload directly to the QSM.
+This is used for NNC commands. Userspace is responsible for managing
+the QSM message requirements in the payload
+
+dma_xfer- DMA transfer. Describes an object that the QSM should DMA into the
+device via address and size tuples.
+
+activate- Activate a workload onto NSPs. The host must provide memory to be
+used by the DBC.
+
+deactivate- Deactivate an active workload and return the NSPs to idle.
+
+status- Query the QSM about it's NNC implementation. Returns the NNC version,
+and if CRC is used.
+
+terminate- Release a user's resources.
+
+dma_xfer_cont- Continuation of a previous DMA transfer. If a DMA transfer
+cannot be specified in a single message (highly fragmented), this
+transaction can be used to specify more ranges.
+
+validate_partition- Query to QSM to determine if a partition identifier is
+valid.

What about using definition lists?

Done

+DRM_IOCTL_QAIC_MANAGE:
+This IOCTL allows userspace to send a NNC request to the QSM. The call will
+block until a response is received, or the request has timed out.
+
+DRM_IOCTL_QAIC_CREATE_BO:
+This IOCTL allows userspace to allocate a buffer object (BO) which can send or
+receive data from a workload. The call will return a GEM handle that
+represents the allocated buffer. The BO is not usable until it has been sliced
+(see DRM_IOCTL_QAIC_ATTACH_SLICE_BO).
+
+DRM_IOCTL_QAIC_MMAP_BO:
+This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the
+userspace process.
+
+DRM_IOCTL_QAIC_ATTACH_SLICE_BO:
+This IOCTL allows userspace to slice a BO in preparation for sending the BO to
+the device. Slicing is the operation of describing what portions of a BO get
+sent where to a workload. This requires a set of DMA transfers for the DMA
+Bridge, and as such, locks the BO to a specific DBC.
+
+DRM_IOCTL_QAIC_EXECUTE_BO:
+This IOCTL allows userspace to submit a set of sliced BOs to the device. The
+call is non-blocking. Success only indicates that the BOs have been queued
+to the device, but does not guarantee they have been executed.
+
+DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO:
+This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace to
+shrink the BOs sent to the device for this specific call. If a BO typically has
+N inputs, but only a subset of those is available, this IOCTL allows userspace
+to indicate that only the first M bytes of the BO should be sent to the device
+to minimize data transfer overhead. This IOCTL dynamically recomputes the
+slicing, and therefore has some processing overhead before the BOs can be queued
+to the device.
+
+DRM_IOCTL_QAIC_WAIT_BO:
+This IOCTL allows userspace to determine when a particular BO has been processed
+by the device. The call will block until either the BO has been processed and
+can be re-queued to the device, or a timeout occurs.
+
+DRM_IOCTL_QAIC_PERF_STATS_BO:
+This IOCTL allows userspace to collect performance statistics on the most
+recent execution of a BO. This allows userspace to construct an end to end
+timeline of the BO processing for a performance analysis.
+
+DRM_IOCTL_QAIC_PART_DEV:
+This IOCTL allows userspace to request a duplicate "shadow device". This extra
+accelN device is associated with a specific partition of resources on the AIC100
+device and can be used for limiting a process to some subset of resources.

Again, I'd like to see definition lists for above.

Done.

See Sphinx primer [1] for examples of markups I point above.

Thanks.

[1]: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux