Am 2020-08-20 um 5:38 a.m. schrieb Huang Rui: > On Thu, Aug 20, 2020 at 08:31:25AM +0800, Huang Rui wrote: >> On Thu, Aug 20, 2020 at 08:18:57AM +0800, Kuehling, Felix wrote: >>> On 2020-08-19 7:56 p.m., Huang Rui wrote: >>>> On Wed, Aug 19, 2020 at 11:38:34PM +0800, Kuehling, Felix wrote: >>>>> Am 2020-08-19 um 7:06 a.m. schrieb Huang Rui: >>>>>> We still have a few iommu issues which need to address, so force raven >>>>>> as "dgpu" path for the moment. >>>>>> >>>>>> This is to add the fallback path to bypass IOMMU if IOMMU v2 is disabled >>>>>> or ACPI CRAT table not correct. >>>>>> >>>>>> v2: Use ignore_crat parameter to decide whether it will go with IOMMUv2. >>>>>> v3: Align with existed thunk, don't change the way of raven, only renoir >>>>>> will use "dgpu" path by default. >>>>>> >>>>>> Signed-off-by: Huang Rui <ray.huang@xxxxxxx> >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++- >>>>>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 28 ++++++++++++++++++++++- >>>>>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- >>>>>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- >>>>>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 1 + >>>>>> 5 files changed, 34 insertions(+), 4 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>> index a9a4319c24ae..189f9d7e190d 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>>> @@ -684,11 +684,14 @@ MODULE_PARM_DESC(debug_largebar, >>>>>> * Ignore CRAT table during KFD initialization. By default, KFD uses the ACPI CRAT >>>>>> * table to get information about AMD APUs. This option can serve as a workaround on >>>>>> * systems with a broken CRAT table. >>>>>> + * >>>>>> + * Default is auto (according to asic type, iommu_v2, and crat table, to decide >>>>>> + * whehter use CRAT) >>>>>> */ >>>>>> int ignore_crat; >>>>>> module_param(ignore_crat, int, 0444); >>>>>> MODULE_PARM_DESC(ignore_crat, >>>>>> - "Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)"); >>>>>> + "Ignore CRAT table during KFD initialization (0 = auto (default), 1 = ignore CRAT)"); >>>>>> >>>>>> /** >>>>>> * DOC: halt_if_hws_hang (int) >>>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>>>>> index 59557e3e206a..f8346d4402e2 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>>>>> @@ -22,6 +22,7 @@ >>>>>> >>>>>> #include <linux/pci.h> >>>>>> #include <linux/acpi.h> >>>>>> +#include <asm/processor.h> >>>>>> #include "kfd_crat.h" >>>>>> #include "kfd_priv.h" >>>>>> #include "kfd_topology.h" >>>>>> @@ -740,6 +741,30 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev, >>>>>> return 0; >>>>>> } >>>>>> >>>>>> + >>>>>> +#ifdef CONFIG_ACPI >>>>>> +static void kfd_setup_ignore_crat_option(void) >>>>>> +{ >>>>>> + >>>>>> + if (ignore_crat) >>>>>> + return; >>>>>> + >>>>>> +#ifndef KFD_SUPPORT_IOMMU_V2 >>>>>> + ignore_crat = 1; >>>>>> +#else >>>>>> + ignore_crat = 0; >>>>>> +#endif >>>>>> + >>>>>> + /* Renoir use the fallback path to align with existed thunk */ >>>>> Are you sure you need special code for Renoir here? For Renoir the >>>>> dev->device_info already treats it as a dGPU and always has. >>>> Renoir also is an APU, in other words, we might have got the correct CRAT >>>> table from SBIOS (the CRAT table in SBIOS for renoir is broken so far). If >>>> we had got CRAT table, the kfd would create an APU node. That's not >>>> expected. >>> kfd_assign_gpu will not assign a Renoir GPU as the APU from the CRAT >>> table because gpu->device_info->needs_iommu_device is False for Renoir. >>> So Renoir will always show up in the topology as its own discrete GPU node. >>> >>> How does this work today? Renoir is already treated as a dGPU. But the >>> CPU node info (/sys/class/kfd/kfd/topology/nodes/0/properties) from the >>> CRAT table still shows GPU cores? >>> >> Yes, Renoir works well. In fact, I found the problem while I was enabling >> the dGPU path for raven before. Even I set needs_iommu_device as false in >> raven's device info. The kfd still creates the APU node. (in v1 patch) >> >> Let me rollback to check it again. >> > Hi Felix, > > I double check it again. If Renoir's ACPI CRAT were good, we wouldn't > create virtual CRAT table for Renoir at this moement. Then the pure CPU > node is unable to be created. That conflicted with needs_iommu_device flag > (we hardcode needs_iommu_device as false for renoir). Then will break the > user mode driver. (please see below rocminfo) > > rocminfo: /libhsakmt/src/topology.c:1079: topology_sysfs_get_node_props: Assertion `props->EngineId.ui32.Major' failed. > > So we probably would better have a specific handling to make sure Renoir > with virtual CRAT. OK. That would be a solution that works for Renoir only. I'd suggest trying a more general solution in node_show in kfd_topology.c. If we see an APU node (that has CPU and GPU cores) with no associated GPU (dev->gpu_id is 0 or dev->gpu is NULL), we can report it as a pure CPU node to user mode by just reporting the simd_count as 0. Regards, Felix > > Thanks, > Ray _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx