Am 2020-08-19 um 11:09 p.m. schrieb Huang Rui: > On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote: >> On 2020-08-19 7:56 p.m., Huang Rui wrote: >>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote: >>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui: >>>>> We still have a few iommu issues which need to address, so force raven >>>>> as "dgpu" path for the moment. >>>>> >>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled >>>>> or ACPI CRAT table not correct. >>>>> >>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2. >>>>> v3: Align with existed thunk, don't change the way of raven, only renoir >>>>> will use "dgpu" path by default. >>>>> >>>>> Signed-off-by: Huang Rui <ray.huang@xxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++- >>>>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++++++++++++++++++++++- >>>>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- >>>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- >>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + >>>>> 5 files changed, 34 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> index a9a4319c24ae..189f9d7e190d 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar, >>>>> * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT >>>>> * table to get information about AMD APUs. This option can serve as a workaround on >>>>> * systems with a broken CRAT table. >>>>> + * >>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide >>>>> + * whehter use CRAT) >>>>> */ >>>>> int ignore_crat; >>>>> module_param(ignore_crat, int, 0444); >>>>> MODULE_PARM_DESC(ignore_crat, >>>>> - "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)"); >>>>> + "Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)"); >>>>> >>>>> /** >>>>> * DOC: halt_if_hws_hang (int) >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>>>> index 59557e3e206a..f8346d4402e2 100644 >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>>>> @@ -22,6 +22,7 @@ >>>>> >>>>> #include <linux/pci.h> >>>>> #include <linux/acpi.h> >>>>> +#include <asm/processor.h> >>>>> #include "kfd_crat.h" >>>>> #include "kfd_priv.h" >>>>> #include "kfd_topology.h" >>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, >>>>> return 0; >>>>> } >>>>> >>>>> + >>>>> +#ifdef CONFIG_ACPI >>>>> +static void kfd_setup_ignore_crat_option(void) >>>>> +{ >>>>> + >>>>> + if (ignore_crat) >>>>> + return; >>>>> + >>>>> +#ifndef KFD_SUPPORT_IOMMU_V2 >>>>> + ignore_crat = 1; >>>>> +#else >>>>> + ignore_crat = 0; >>>>> +#endif >>>>> + >>>>> + /* Renoir use the fallback path to align with existed thunk */ >>>> Are you sure you need special code for Renoir here? For Renoir the >>>> dev->device_info already treats it as a dGPU and always has. >>> Renoir also is an APU, in other words, we might have got the correct CRAT >>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If >>> we had got CRAT table, the kfd would create an APU node. That's not >>> expected. >> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT >> table because gpu->device_info->needs_iommu_device is False for Renoir. >> So Renoir will always show up in the topology as its own discrete GPU node. >> >> How does this work today? Renoir is already treated as a dGPU. But the >> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the >> CRAT table still shows GPU cores? >> >> Regards, >> Felix >> >> >>>> I don't like the whole idea of changing the value of a module parameter, >>>> because it is global and visible to the user through sysfs. Instead, if >>>> you need to override the value of ignore_crat to consider other >>>> conditions, I think kfd_device_use_iommu_v2 and >>>> kfd_create_crat_image_acpi would be the right place to do it. >>>> >>>> To avoid duplicating the conditions, you could add a helper function >>>> bool kfd_ignore_crat(void) that can be called instead of using the >>>> ignore_crat parameter directly. This function, changing the global >>>> module parameter, should be removed. >>> That makes sense. Will update it in next version. >>> >>>>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && >>>>> + boot_cpu_data.x86 == 0x17 && >>>>> + boot_cpu_data.x86_model >= 0x60 && boot_cpu_data.x86_model < 0x70) { >>>>> + ignore_crat = 1; >>>>> + } >>>>> + >>>>> + return; >>>>> +} >>>>> + >>>>> /* >>>>> * kfd_create_crat_image_acpi - Allocates memory for CRAT image and >>>>> * copies CRAT from ACPI (if available). >>>>> @@ -751,7 +776,6 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, >>>>> * >>>>> * Return 0 if successful else return error code >>>>> */ >>>>> -#ifdef CONFIG_ACPI >>>>> int kfd_create_crat_image_acpi(void **crat_image, size_t *size) >>>>> { >>>>> struct acpi_table_header *crat_table; >>>>> @@ -775,6 +799,8 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size) >>>>> return -EINVAL; >>>>> } >>>>> >>>>> + kfd_setup_ignore_crat_option(); >>>>> + >>>>> if (ignore_crat) { >>>>> pr_info("CRAT table disabled by module option\n"); >>>>> return -ENODATA; >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>>>> index 2c030c2b5b8d..dab44951c4d8 100644 >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>>>> @@ -112,6 +112,7 @@ static const struct kfd_device_info carrizo_device_info = { >>>>> .num_xgmi_sdma_engines = 0, >>>>> .num_sdma_queues_per_engine = 2, >>>>> }; >>>>> +#endif >>>>> >>>>> static const struct kfd_device_info raven_device_info = { >>>>> .asic_family = CHIP_RAVEN, >>>>> @@ -130,7 +131,6 @@ static const struct kfd_device_info raven_device_info = { >>>>> .num_xgmi_sdma_engines = 0, >>>>> .num_sdma_queues_per_engine = 2, >>>>> }; >>>>> -#endif >>>>> >>>>> static const struct kfd_device_info hawaii_device_info = { >>>>> .asic_family = CHIP_HAWAII, >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>>>> index 82f955750e75..4b6e7ef7a71c 100644 >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>>>> @@ -1234,7 +1234,7 @@ static inline int kfd_devcgroup_check_permission(struct kfd_dev *kfd) >>>>> >>>>> static inline bool kfd_device_use_iommu_v2(const struct kfd_dev *dev) >>>>> { >>>>> - return dev && dev->device_info->needs_iommu_device; >>>>> + return !ignore_crat && dev && dev->device_info->needs_iommu_device; >>>>> } >>>>> >>>>> /* Debugfs */ >>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>>> index 4b29815e9205..b92ce75a4c53 100644 >>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>>>> @@ -1090,6 +1090,7 @@ int kfd_topology_init(void) >>>>> COMPUTE_UNIT_CPU, NULL, >>>>> proximity_domain); >>>>> cpu_only_node = 1; >>>>> + ignore_crat = 1; >>>> Don't change the global variable. I think you're doing this here in case >>>> the CRAT table is broken and contains no GPU info. Maybe we need to add >>>> a new flag "use_iommu_v2" into the kfd_dev structure to handle this. >>>> > Find it just now, kfd_dev is not initialized here. So we may be unable to > use flag in kfd_dev. I see. This is very early during module init. When you get here, you already failed to read the ACPI CRAT table and created a VCRAT for the CPU with no GPU cores. If you wanted to add a per device "use_iommu_v2" flag, you could probably set that in kfd_assign_gpu when it assigns a KFD device to a node with CPU cores. Regards, Felix > > Thanks, > Ray _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx