On 10/4/24 10:06, Jeffrey Hugo wrote:
On 9/11/2024 12:05 PM, Lizhi Hou wrote:
AMD NPU (Neural Processing Unit) is a multi-user AI inference
accelerator
integrated into AMD client APU. NPU enables efficient execution of
Machine
Learning applications like CNN, LLM, etc. NPU is based on AMD XDNA
Architecture. NPU is managed by amdxdna driver.
Co-developed-by: Sonal Santan <sonal.santan@xxxxxxx>
Signed-off-by: Sonal Santan <sonal.santan@xxxxxxx>
Signed-off-by: Lizhi Hou <lizhi.hou@xxxxxxx>
---
Documentation/accel/amdxdna/amdnpu.rst | 283 +++++++++++++++++++++++++
Documentation/accel/amdxdna/index.rst | 11 +
Documentation/accel/index.rst | 1 +
3 files changed, 295 insertions(+)
create mode 100644 Documentation/accel/amdxdna/amdnpu.rst
create mode 100644 Documentation/accel/amdxdna/index.rst
diff --git a/Documentation/accel/amdxdna/amdnpu.rst
b/Documentation/accel/amdxdna/amdnpu.rst
new file mode 100644
index 000000000000..2af3bc5b2a9e
--- /dev/null
+++ b/Documentation/accel/amdxdna/amdnpu.rst
@@ -0,0 +1,283 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+.. include:: <isonum.txt>
+
+.. SPDX-License-Identifier: GPL-2.0-only
SPDX twice?
I will remove one.
+
+=========
+ AMD NPU
+=========
+
+:Copyright: |copy| 2024 Advanced Micro Devices, Inc.
+:Author: Sonal Santan <sonal.santan@xxxxxxx>
+
+Overview
+========
+
+AMD NPU (Neural Processing Unit) is a multi-user AI inference
accelerator
+integrated into AMD client APU. NPU enables efficient execution of
Machine
+Learning applications like CNN, LLM, etc. NPU is based on
+`AMD XDNA Architecture`_. NPU is managed by **amdxdna** driver.
+
+
+Hardware Description
+====================
+
+AMD NPU consists of the following hardware components:
+
+AMD XDNA Array
+--------------
+
+AMD XDNA Array comprises of 2D array of compute and memory tiles
built with
+`AMD AI Engine Technology`_. Each column has 4 rows of compute tiles
and 1
+row of memory tile. Each compute tile contains a VLIW processor with
its own
+dedicated program and data memory. The memory tile acts as L2
memory. The 2D
+array can be partitioned at a column boundary creating a spatially
isolated
+partition which can be bound to a workload context.
+
+Each column also has dedicated DMA engines to move data between host
DDR and
+memory tile.
+
+AMD Phoenix and AMD Hawk Point client NPU have a 4x5 topology, i.e.,
4 rows of
+compute tiles arranged into 5 columns. AMD Strix Point client APU
have 4x8
+topology, i.e., 4 rows of compute tiles arranged into 8 columns.
+
+Shared L2 Memory
+................
Why a line of "." instead of "-" likse elsewhere?
I will fix it.
+
+The single row of memory tiles create a pool of software managed on
chip L2
+memory. DMA engines are used to move data between host DDR and
memory tiles.
+AMD Phoenix and AMD Hawk Point NPUs have a total of 2560 KB of L2
memory.
+AMD Strix Point NPU has a total of 4096 KB of L2 memory.
+
+Microcontroller
+---------------
+
+A microcontroller runs NPU Firmware which is responsible for command
processing,
+XDNA Array partition setup, XDNA Array configuration, workload context
+management and workload orchestration.
+
+NPU Firmware uses a dedicated instance of an isolated non-privileged
context
+called ERT to service each workload context. ERT is also used to
execute user
+provided ``ctrlcode`` associated with the workload context.
+
+NPU Firmware uses a single isolated privileged context called MERT
to service
+management commands from the amdxdna driver.
+
+Mailboxes
+.........
Again, odd delimiter
+
+The microcontroller and amdxdna driver use a privileged channel for
management
+tasks like setting up of contexts, telemetry, query, error handling,
setting up
+user channel, etc. As mentioned before, privileged channel requests are
+serviced by MERT. The privileged channel is bound to a single mailbox.
+
+The microcontroller and amdxdna driver use a dedicated user channel per
+workload context. The user channel is primarily used for submitting
work to
+the NPU. As mentioned before, a user channel requests are serviced
by an
+instance of ERT. Each user channel is bound to its own dedicated
mailbox.
+
+PCIe EP
+-------
+
+NPU is visible to the x86 as a PCIe device with multiple BARs and
some MSI-X
"to the x86" - feels like something is missing here. Maybe "x86 host
CPU"?
Yes. I will change to "to the x86 host CPU".
+interrupt vectors. NPU uses a dedicated high bandwidth SoC level
fabric for
+reading or writing into host memory. Each instance of ERT gets its
own dedicated
+MSI-X interrupt. MERT gets a single instance of MSI-X interrupt.
<snip>
diff --git a/Documentation/accel/amdxdna/index.rst
b/Documentation/accel/amdxdna/index.rst
new file mode 100644
index 000000000000..38c16939f1fc
--- /dev/null
+++ b/Documentation/accel/amdxdna/index.rst
@@ -0,0 +1,11 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+=====================================
+ accel/amdxdna NPU driver
+=====================================
+
+The accel/amdxdna driver supports the AMD NPU (Neural Processing Unit).
+
+.. toctree::
+
+ amdnpu
diff --git a/Documentation/accel/index.rst
b/Documentation/accel/index.rst
index e94a0160b6a0..0a94b6766263 100644
--- a/Documentation/accel/index.rst
+++ b/Documentation/accel/index.rst
@@ -9,6 +9,7 @@ Compute Accelerators
introduction
qaic/index
+ amdxdna/index
I think alphabetical order makes sense to me, considering there
probably should be more entries added over time. This would suggest
that your addition should occur one line up. What do you think?
I will fix it.
Thanks,
Lizhi
.. only:: subproject and html