On 8/14/2024 2:24 PM, Lizhi Hou wrote:
On 8/14/24 11:46, Jeffrey Hugo wrote:
On 8/14/2024 12:16 PM, Lizhi Hou wrote:
On 8/9/24 09:11, Jeffrey Hugo wrote:
On 8/5/2024 11:39 AM, Lizhi Hou wrote:
diff --git a/drivers/accel/amdxdna/aie2_pci.c
b/drivers/accel/amdxdna/aie2_pci.c
new file mode 100644
index 000000000000..3660967c00e6
--- /dev/null
+++ b/drivers/accel/amdxdna/aie2_pci.c
@@ -0,0 +1,182 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2023-2024, Advanced Micro Devices, Inc.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/errno.h>
+#include <linux/firmware.h>
+#include <linux/iommu.h>
You are clearly missing linux/pci.h and I suspect many more.
Other headers are indirectly included by "aie2_pci.h" underneath.
aie2_pci.h also does not directly include linux/pci.h
it is aie2_pci.h --> amdxdna_pci_drv.h --> linux/pci.h.
It compiles without any issue.
I did not doubt that it compiled. To be clear, I'm pointing out what I
believe is poor style - relying on implicit includes.
There is no reason that amdxdna_pci_drv.h needs to include linux/pci.h
(at-least in this patch). That header file doesn't use any of the
content from pci.h. If this were merged, I suspect it would be valid
for someone to post a cleanup patch that removes pci.h from
amdxdna_pci_drv.h.
The problem comes when someone does a tree wide refactor of some header
- perhaps moving a function out of pci.h into something else. If they
grep the source tree, they'll find amdxdna_pci_drv.h includes pci.h but
really doesn't use it. They likely won't see aie2_pci.c which may break
because of the refactor. Even more problematic is if pci.h is including
something that you need, and you are not including it anywhere. The
code will still compile, but maybe in the next kernel cycle pci.h no
longer includes that thing. Your code will break.
The 4 includes you have here seems entirely too little, and I'm not
clearly seeing the logic of what gets explicitly included vs what is
implicitly included. firmware.h is explicitly included, but pci.h is
not, yet it seems like you use a lot more from pci.h.
There is the include what you use project that attempts to automate
this, although I don't know how well it works with kernel code -
https://github.com/include-what-you-use/include-what-you-use
+
+ ret = dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(64));
+ if (ret) {
+ XDNA_ERR(xdna, "Failed to set DMA mask: %d", ret);
+ goto release_fw;
+ }
+
+ nvec = pci_msix_vec_count(pdev);
This feels weird. Can your device advertise variable number of
MSI-X vectors? It only works if all of the vectors are used?
That is possible. the driver supports different hardware. And the fw
assigns vector for hardware context dynamically. So the driver needs
to allocate all vectors ahead.
So, if the device requests N MSIs, but the host is only able to
satisfy 1 (or some number less than N), the fw is completely unable to
function?
The fw may return interrupt 2 is assigned to hardware context. Then the
driver may not deal with it in this case. I think it is ok to fail if
the system has very limited resource.
Ok. I suspect you'll want to change that behavior in the future with a
fw update, but if this is how things work today, then this is how the
driver must be.