Re: [PATCH V2 01/10] accel/amdxdna: Add a new driver for AMD AI Engine

Lizhi Hou <lizhi.hou@xxxxxxx> · Wed, 14 Aug 2024 13:24:09 -0700

On 8/14/24 11:46, Jeffrey Hugo wrote:
On 8/14/2024 12:16 PM, Lizhi Hou wrote:

On 8/9/24 09:11, Jeffrey Hugo wrote:
On 8/5/2024 11:39 AM, Lizhi Hou wrote:

diff --git a/drivers/accel/amdxdna/aie2_pci.c 
b/drivers/accel/amdxdna/aie2_pci.c
new file mode 100644
index 000000000000..3660967c00e6
--- /dev/null
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/errno.h>
+#include <linux/firmware.h>
+#include <linux/iommu.h>

You are clearly missing linux/pci.h and I suspect many more.
Other headers are indirectly included by "aie2_pci.h" underneath.

aie2_pci.h also does not directly include linux/pci.h

it is aie2_pci.h --> amdxdna_pci_drv.h --> linux/pci.h.

It compiles without any issue.


+
+    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+    if (ret) {
+        XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret);
+        goto release_fw;
+    }
+
+    nvec = pci_msix_vec_count(pdev);

This feels weird.  Can your device advertise variable number of 
MSI-X vectors?  It only works if all of the vectors are used?
That is possible. the driver supports different hardware. And the fw 
assigns vector for hardware context dynamically. So the driver needs 
to allocate all vectors ahead.

So, if the device requests N MSIs, but the host is only able to 
satisfy 1 (or some number less than N), the fw is completely unable to 
function?
The fw may return interrupt 2 is assigned to hardware context. Then the 
driver may not deal with it in this case. I think it is ok to fail if 
the system has very limited resource.


+struct psp_device *aie2m_psp_create(struct device *dev, struct 
psp_config *conf)
+{
+    struct psp_device *psp;
+    u64 offset;
+
+    psp = devm_kzalloc(dev, sizeof(*psp), GFP_KERNEL);
+    if (!psp)
+        return NULL;
+
+    psp->dev = dev;
+    memcpy(psp->psp_regs, conf->psp_regs, sizeof(psp->psp_regs));
+
+    psp->fw_buf_sz = ALIGN(conf->fw_size, PSP_FW_ALIGN) + 
PSP_FW_ALIGN;
+    psp->fw_buffer = devm_kmalloc(psp->dev, psp->fw_buf_sz, 
GFP_KERNEL);

Feels like this (and a bunch of other instances I haven't commented 
on) should be drmm_* allocs.

The PSP code is kind of low level and directly interact with 
hardware. All the PSP interfaces use struct device * instead of 
drm_device. I think it is kind make sense because PSP is not related 
to drm.

I will scan all other allocs and change them to drmm_* allocs for the 
code related to drm_device. Does this sound ok to you?

I don't think so.  Look up
drm/todo: Add TODO entry for "lints"
on the dri-devel list, and its history.
Ok, I will replace them with drm_*alloc.



+    if (!psp->fw_buffer) {
+        dev_err(psp->dev, "no memory for fw buffer");
+        return NULL;
+    }
+
+    psp->fw_paddr = virt_to_phys(psp->fw_buffer);

I'm pretty sure virt_to_phys() is always wrong

The hardware exposes several registers to communicate with platform 
PSP (AMD Platform Security Processor) to load NPU firmware. And PSP 
only accept host physical address with current hardware.

I understand usually virt_to_phys() should not be needed for device 
driver. And maybe it is ok to use if there is hardware requirement? I 
can see some drivers use it as well.

Eh.  I guess the PSP would never have an IOMMU in front of it or 
anything like that.

This feels similar to what Qualcomm MSM platforms do, which uses the 
remoteproc framework.  Not sure if that helps you here.

This still feels not good, but you might have a valid exception here. 
I'd suggest putting a justification comment in the code through. 
Someone looking at this in X months might raise the same question.
Sure. I will add a justification.



+    offset = ALIGN(psp->fw_paddr, PSP_FW_ALIGN) - psp->fw_paddr;
+    psp->fw_paddr += offset;
+    memcpy(psp->fw_buffer + offset, conf->fw_buf, conf->fw_size);
+
+    return psp;
+}
diff --git a/drivers/accel/amdxdna/amdxdna_drm.c 
b/drivers/accel/amdxdna/amdxdna_drm.c
new file mode 100644
index 000000000000..91e4f9c9dac9
--- /dev/null
+++ b/drivers/accel/amdxdna/amdxdna_drm.c

What is the point of this file?  Seems like all of this could just 
be in amdxdna_pci_drv.c
The future product may have NPU with non-pci device. So it might be a 
amdxdna_plat_drv.c and share the same amdxdna_drm.c in the future.

This seems like a weak justification.  "may" is not definitive. If 
such hardware appears, you could refactor the driver at that time.
Ok, I will merge them.


diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c 
b/drivers/accel/amdxdna/amdxdna_pci_drv.c
new file mode 100644
index 000000000000..7d0cfd918b0e
--- /dev/null
+++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2022-2024, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/module.h>
+
+#include "amdxdna_pci_drv.h"
+
+/*
+ *  There are platforms which share the same PCI device ID
+ *  but have different PCI revision IDs. So, let the PCI class
+ *  determine the probe and later use the (device_id, rev_id)
+ *  pair as a key to select the devices.
+ */

Huh?  So, VID == AMD, DID == 0x17f0, rev == 0x1 is a completely 
different device?  That feels like a PCIe spec violation...
Maybe the comment is misleading. The hardware with same device id 
0x17f0 uses the same commands, registers etc. And they are same 
device with different revisions.

Then I don't understand why you need to do the class matching. Match 
on PCI_VENDOR_ID_AMD with the Device IDs you need to support like a 
"normal" PCI(e) driver?

ok. I will used device id to bind.


Thanks,

Lizhi



+static const struct pci_device_id pci_ids[] = {
+    { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_ANY_ID),
+        .class = PCI_CLASS_SP_OTHER << 8,

Weird.  I would have expected the Accelerator class to be used
We contacted our hardware team to figure out why accelerator class is 
not used here. Some of hardware is already released. Hopefully 
hardware team may consider to use accelerator class with new products.