Am 2020-08-11 um 7:45 a.m. schrieb Huang Rui: > On Fri, Aug 07, 2020 at 11:00:38PM +0800, Kuehling, Felix wrote: >> Am 2020-08-07 um 4:25 a.m. schrieb Huang Rui: >>> We still have a few iommu issues which need to address, so force raven >>> as "dgpu" path for the moment. >>> >>> Will enable IOMMUv2 since the issues are fixed. >> Do you mean "_when_ the issues are fixed"? > Yes. > >> The current iommuv2 troubles aside, I think this change breaks existing >> user mode. The existing Thunk is not prepared to see Raven as a dGPU. So >> this is not something we want to do in an upstream Linux kernel change. > Do you mean it may break the thunk without setting "is_dgpu" flag in the > hsa_gfxip_table for raven? Exactly. Try running your modified KFD against an unchanged Thunk. If it fails, you cannot make that kernel change. > >> I suggest using the ignore_crat module parameter for the workaround >> instead. You may need to duplicate the raven_device_info and pick the >> right one depending on whether it is a dGPU or an APU. The only >> difference would be the need_iommu_device option. If ignore_crat is set, >> you can support Raven as a dGPU and require a corresponding Thunk change >> that conditionally support Raven as a dGPU. >> >> I think such a change would also be the right direction for supporting >> Raven more universally in the future. It can be extended to >> conditionally treat Raven as a dGPU automatically in some situations: >> >> * broken or missing CRAT table >> * IOMMUv2 disabled >> >> Those are all situations where the current driver is broken anyway (and >> always has been), so it would not be a kernel change that breaks >> existing user mode. > OK, I think I understand it. We should use a parameter/flag such as > ignore_crat or something else to inform the user mode which path we should > go, treat it as dgpu or apu. Then thunk can detect the flag from kernel to > know how to handle the case. Am I right? User mode should not look at the ignore_crat parameter. It is not part of the user-kernel API. The kernel parameter is to control kernel behaviour. The point is, that the default behaviour should not change, because that would break existing user mode. A fixed user mode can use existing topology information to figure out which way to go. If the CUs are in the CPU node, it's an APU. Otherwise it's a dGPU. > >> In addition the Thunk could be changed to downgrade a Raven APU to dGPU >> (by splitting the APU node into a separate CPU and dGPU node) if other >> dGPUs are present in the systems to disable all the APU-specific code >> paths and allow all the GPUs to work together seamlessly with SVM. > Thanks! Let me look at the thunk again. > > For now, based on your comments, I would like to update patch as: > > - Modify the ignore_crat parameter as tristate: "use", "ignore", and > "auto". Usually, by default is "auto = use", but in some special case > such as iommu v2 broken or no iommu v2 support yet (Renoir), we have to > set "auto = ignore". Then treat it as "dgpu". And if CRAT table is > broken/missing, we will fall back to treat it as "dgpu" as well. > > What do you think of it? If the CRAT is broken/missing or IOMMUv2 is not enabled, "use" is not a very useful setting. I think you only need two states: "auto" and "ignore". That's pretty much what we have now. Except we need more conditions to not use the CRAT in the "auto" case. Regards, Felix > > Thanks, > Ray > >> Regards, >> Felix >> >> >>> Signed-off-by: Huang Rui <ray.huang@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 ++++++ >>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- >>> 2 files changed, 8 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> index 6a250f8fcfb8..66d9f7280fe8 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> @@ -22,6 +22,7 @@ >>> >>> #include <linux/pci.h> >>> #include <linux/acpi.h> >>> +#include <asm/processor.h> >>> #include "kfd_crat.h" >>> #include "kfd_priv.h" >>> #include "kfd_topology.h" >>> @@ -781,6 +782,11 @@ int kfd_create_crat_image_acpi(void **crat_image, size_t *size) >>> return -ENODATA; >>> } >>> >>> + /* Raven series will go with the fallback path to use virtual CRAT */ >>> + if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD && >>> + boot_cpu_data.x86 == 0x17) >>> + return -ENODATA; >>> + >>> pcrat_image = kmemdup(crat_table, crat_table->length, GFP_KERNEL); >>> if (!pcrat_image) >>> return -ENOMEM; >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> index d5e790f046b4..93179c928371 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> @@ -116,6 +116,7 @@ static const struct kfd_device_info carrizo_device_info = { >>> .num_xgmi_sdma_engines = 0, >>> .num_sdma_queues_per_engine = 2, >>> }; >>> +#endif >>> >>> static const struct kfd_device_info raven_device_info = { >>> .asic_family = CHIP_RAVEN, >>> @@ -128,13 +129,12 @@ static const struct kfd_device_info raven_device_info = { >>> .num_of_watch_points = 4, >>> .mqd_size_aligned = MQD_SIZE_ALIGNED, >>> .supports_cwsr = true, >>> - .needs_iommu_device = true, >>> + .needs_iommu_device = false, >>> .needs_pci_atomics = true, >>> .num_sdma_engines = 1, >>> .num_xgmi_sdma_engines = 0, >>> .num_sdma_queues_per_engine = 2, >>> }; >>> -#endif >>> >>> static const struct kfd_device_info hawaii_device_info = { >>> .asic_family = CHIP_HAWAII, _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx