On 2020-08-19 7:56 p.m., Huang Rui wrote:
On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote:Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui:We still have a few iommu issues which need to address, so force raven as "dgpu" path for the moment. This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled or ACPI CRAT table not correct. v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2. v3: Align with existed thunk, don't change the way of raven, only renoir will use "dgpu" path by default. Signed-off-by: Huang Rui <ray.huang@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++++++++++++++++++++++- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + 5 files changed, 34 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index a9a4319c24ae..189f9d7e190d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar, * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT * table to get information about AMD APUs. This option can serve as a workaround on * systems with a broken CRAT table. + * + * Default is auto (according to asic type, iommu_v2, and crat table, to decide + * whehter use CRAT) */ int ignore_crat; module_param(ignore_crat, int, 0444); MODULE_PARM_DESC(ignore_crat, - "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)"); + "Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)");/*** DOC: halt_if_hws_hang (int) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c index 59557e3e206a..f8346d4402e2 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c @@ -22,6 +22,7 @@#include <linux/pci.h>#include <linux/acpi.h> +#include <asm/processor.h> #include "kfd_crat.h" #include "kfd_priv.h" #include "kfd_topology.h" @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, return 0; }++#ifdef CONFIG_ACPI +static void kfd_setup_ignore_crat_option(void) +{ + + if (ignore_crat) + return; + +#ifndef KFD_SUPPORT_IOMMU_V2 + ignore_crat = 1; +#else + ignore_crat = 0; +#endif + + /* Renoir use the fallback path to align with existed thunk */Are you sure you need special code for Renoir here? For Renoir the dev->device_info already treats it as a dGPU and always has.Renoir also is an APU, in other words, we might have got the correct CRAT table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If we had got CRAT table, the kfd would create an APU node. That's not expected.
kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT table because gpu->device_info->needs_iommu_device is False for Renoir. So Renoir will always show up in the topology as its own discrete GPU node.
How does this work today? Renoir is already treated as a dGPU. But the CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the CRAT table still shows GPU cores?
Regards, Felix
I don't like the whole idea of changing the value of a module parameter, because it is global and visible to the user through sysfs. Instead, if you need to override the value of ignore_crat to consider other conditions, I think kfd_device_use_iommu_v2 and kfd_create_crat_image_acpi would be the right place to do it. To avoid duplicating the conditions, you could add a helper function bool kfd_ignore_crat(void) that can be called instead of using the ignore_crat parameter directly. This function, changing the global module parameter, should be removed.That makes sense. Will update it in next version.+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && + boot_cpu_data.x86 == 0x17 && + boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) { + ignore_crat = 1; + } + + return; +} + /* * kfd_create_crat_image_acpi - Allocates memory for CRAT image and * copies CRAT from ACPI (if available). @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, * * Return 0 if successful else return error code */ -#ifdef CONFIG_ACPI int kfd_create_crat_image_acpi(void **crat_image, size_t *size) { struct acpi_table_header *crat_table; @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size) return -EINVAL; }+ kfd_setup_ignore_crat_option();+ if (ignore_crat) { pr_info("CRAT table disabled by module option\n"); return -ENODATA; diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c index 2c030c2b5b8d..dab44951c4d8 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = { .num_xgmi_sdma_engines = 0, .num_sdma_queues_per_engine = 2, }; +#endifstatic const struct kfd_device_info raven_device_info = {.asic_family = CHIP_RAVEN, @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = { .num_xgmi_sdma_engines = 0, .num_sdma_queues_per_engine = 2, }; -#endifstatic const struct kfd_device_info hawaii_device_info = {.asic_family = CHIP_HAWAII, diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index 82f955750e75..4b6e7ef7a71c 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd)static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev){ - return dev && dev->device_info->needs_iommu_device; + return !ignore_crat && dev && dev->device_info->needs_iommu_device; }/* Debugfs */diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c index 4b29815e9205..b92ce75a4c53 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c @@ -1090,6 +1090,7 @@ int kfd_topology_init(void) COMPUTE_UNIT_CPU, NULL, proximity_domain); cpu_only_node = 1; + ignore_crat = 1;Don't change the global variable. I think you're doing this here in case the CRAT table is broken and contains no GPU info. Maybe we need to add a new flag "use_iommu_v2" into the kfd_dev structure to handle this.Yes, you're right. Will remove global ignore_crat update. Let me revise it again. Thanks, RayRegards, Felixif (ret) { pr_err("Error creating VCRAT table for CPU\n"); return ret;
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx