Re: [PATCH V2 01/10] accel/amdxdna: Add a new driver for AMD AI Engine

Jeffrey Hugo <quic_jhugo@xxxxxxxxxxx> · Wed, 14 Aug 2024 15:53:49 -0600

On 8/14/2024 2:24 PM, Lizhi Hou wrote:

On 8/14/24 11:46, Jeffrey Hugo wrote:
On 8/14/2024 12:16 PM, Lizhi Hou wrote:

On 8/9/24 09:11, Jeffrey Hugo wrote:
On 8/5/2024 11:39 AM, Lizhi Hou wrote:

diff --git a/drivers/accel/amdxdna/aie2_pci.c 
b/drivers/accel/amdxdna/aie2_pci.c
new file mode 100644
index 000000000000..3660967c00e6
--- /dev/null
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/errno.h>
+#include <linux/firmware.h>
+#include <linux/iommu.h>

You are clearly missing linux/pci.h and I suspect many more.
Other headers are indirectly included by "aie2_pci.h" underneath.

aie2_pci.h also does not directly include linux/pci.h

it is aie2_pci.h --> amdxdna_pci_drv.h --> linux/pci.h.

It compiles without any issue.

I did not doubt that it compiled.  To be clear, I'm pointing out what I 
believe is poor style - relying on implicit includes.

There is no reason that amdxdna_pci_drv.h needs to include linux/pci.h 
(at-least in this patch).  That header file doesn't use any of the 
content from pci.h.  If this were merged, I suspect it would be valid 
for someone to post a cleanup patch that removes pci.h from 
amdxdna_pci_drv.h.

The problem comes when someone does a tree wide refactor of some header 
- perhaps moving a function out of pci.h into something else.  If they 
grep the source tree, they'll find amdxdna_pci_drv.h includes pci.h but 
really doesn't use it.  They likely won't see aie2_pci.c which may break 
because of the refactor.  Even more problematic is if pci.h is including 
something that you need, and you are not including it anywhere.  The 
code will still compile, but maybe in the next kernel cycle pci.h no 
longer includes that thing.  Your code will break.

The 4 includes you have here seems entirely too little, and I'm not 
clearly seeing the logic of what gets explicitly included vs what is 
implicitly included.  firmware.h is explicitly included, but pci.h is 
not, yet it seems like you use a lot more from pci.h.

There is the include what you use project that attempts to automate 
this, although I don't know how well it works with kernel code - 
https://github.com/include-what-you-use/include-what-you-use



+
+    ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+    if (ret) {
+        XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret);
+        goto release_fw;
+    }
+
+    nvec = pci_msix_vec_count(pdev);

This feels weird.  Can your device advertise variable number of 
MSI-X vectors?  It only works if all of the vectors are used?
That is possible. the driver supports different hardware. And the fw 
assigns vector for hardware context dynamically. So the driver needs 
to allocate all vectors ahead.

So, if the device requests N MSIs, but the host is only able to 
satisfy 1 (or some number less than N), the fw is completely unable to 
function?
The fw may return interrupt 2 is assigned to hardware context. Then the 
driver may not deal with it in this case. I think it is ok to fail if 
the system has very limited resource.

Ok.  I suspect you'll want to change that behavior in the future with a 
fw update, but if this is how things work today, then this is how the 
driver must be.