On Wed, Feb 7, 2018 at 2:30 AM, Felix Kuehling <felix.kuehling at amd.com> wrote: > On 2018-02-06 03:53 AM, Oded Gabbay wrote: >> On Mon, Feb 5, 2018 at 9:00 PM, Christian König >> <ckoenig.leichtzumerken at gmail.com> wrote: >>> Looks good to me on first glance. >>> >>> You probably don't mind that I'm going to pull a good part of that into >>> amdgpu as next step? >>> >> That indeed looks better then the first approach. >> Felix, I've applied all other patches from the dGPU topology patchset. >> Could you send this new patch after you tested it ? > > Yes. I've fixed and tested it today on CZ and Fiji and I'm rebasing > everything on your updated branch right now. > > I also have some fixes and updates in the GPUVM patch series that I'll > send out again after rebasing. One thing to note is, that > amdgpu_amdkfd_gpuvm.c will have to deal with conflicts at some point. > The amdgpu_bo_create function removed one parameter, and some structure > members were renamed. > > If you submit the amdgpu changes through your branch, either you or Alex > will need to fix that up at some point, depending on who gets to push to > Dave first. Alternatively, I can submit the amdgpu changes through > Alex's tree, but then you'll need to wait for Alex to push them to Dave > before you can apply the amdkfd changes on top of them. > > Which way do you prefer? I don't think you should split it up. Usually Alex is getting to Dave before me, so either I will fix it, or Dave will fix it ;) Oded > > Regards, > Felix > >> Thanks. >> >> >> Christian, I'm going to pull this patch (after its tested and sent >> formally) to amdkfd next for 4.17, so if you will pull it to amdgpu we >> will have a collision. >> >> Oded >> >> >>> Regards, >>> Christian. >>> >>> >>> Am 03.02.2018 um 03:29 schrieb Felix Kuehling: >>> >>> The attached patch is my attempt to keep most of the IOMMU code in one >>> place (new kfd_iommu.c) to avoid #ifdefs all over the place. This way I >>> can still conditionally compile a bunch of KFD code that is only needed >>> for IOMMU handling, with stub functions for kernel configs without IOMMU >>> support. About 300 lines of conditionally compiled code got moved to >>> kfd_iommu.c. >>> >>> The only piece I didn't move into kfd_iommu.c is >>> kfd_signal_iommu_event. I prefer to keep that in kfd_events.c because it >>> doesn't call any IOMMU driver functions, and because it's closely >>> related to the rest of the event handling logic. It could be compiled >>> unconditionally, but it would be dead code without IOMMU support. >>> >>> And I moved pdd->bound to a place where it doesn't consume extra space >>> (on 64-bit systems due to structure alignment) instead of making it >>> conditional. >>> >>> This is only compile-tested for now. >>> >>> If you like this approach, I'll do more testing and squash it with "Make >>> IOMMUv2 code conditional". >>> >>> Regards, >>> Felix >>> >>> >>> On 2018-01-31 10:00 AM, Oded Gabbay wrote: >>> >>> On Wed, Jan 31, 2018 at 4:56 PM, Oded Gabbay <oded.gabbay at gmail.com> wrote: >>> >>> Hi Felix, >>> Please don't spread 19 #ifdefs throughout the code. >>> I suggest to put one #ifdef in linux/amd-iommu.h itself around all the >>> functions declarations and in the #else section put macros with empty >>> implementations. This is much more readable and maintainable. >>> >>> Oded >>> >>> To emphasize my point, there is a call to amd_iommu_bind_pasid in >>> kfd_bind_processes_to_device() which isn't wrapped with the #ifdef so >>> the compliation breaks. Putting the #ifdefs around the calls is simply >>> not scalable. >>> >>> Oded >>> >>> On Fri, Jan 5, 2018 at 12:17 AM, Felix Kuehling <Felix.Kuehling at amd.com> >>> wrote: >>> >>> dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on >>> ASIC information. Also allow building KFD without IOMMUv2 support. >>> This is still useful for dGPUs and prepares for enabling KFD on >>> architectures that don't support AMD IOMMUv2. >>> >>> Signed-off-by: Felix Kuehling <Felix.Kuehling at amd.com> >>> --- >>> drivers/gpu/drm/amd/amdkfd/Kconfig | 2 +- >>> drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 8 +++- >>> drivers/gpu/drm/amd/amdkfd/kfd_device.c | 62 >>> +++++++++++++++++++++---------- >>> drivers/gpu/drm/amd/amdkfd/kfd_events.c | 2 + >>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 5 +++ >>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 17 ++++++--- >>> drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 2 + >>> drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 2 + >>> 8 files changed, 74 insertions(+), 26 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig >>> b/drivers/gpu/drm/amd/amdkfd/Kconfig >>> index bc5a294..5bbeb95 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/Kconfig >>> +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig >>> @@ -4,6 +4,6 @@ >>> >>> config HSA_AMD >>> tristate "HSA kernel driver for AMD GPU devices" >>> - depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64 >>> + depends on DRM_AMDGPU && X86_64 >>> help >>> Enable this if you want to use HSA features on AMD GPU devices. >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> index 2bc2816..3478270 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c >>> @@ -22,7 +22,9 @@ >>> >>> #include <linux/pci.h> >>> #include <linux/acpi.h> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> #include <linux/amd-iommu.h> >>> +#endif >>> #include "kfd_crat.h" >>> #include "kfd_priv.h" >>> #include "kfd_topology.h" >>> @@ -1037,15 +1039,17 @@ static int kfd_create_vcrat_image_gpu(void >>> *pcrat_image, >>> struct crat_subtype_generic *sub_type_hdr; >>> struct crat_subtype_computeunit *cu; >>> struct kfd_cu_info cu_info; >>> - struct amd_iommu_device_info iommu_info; >>> int avail_size = *size; >>> uint32_t total_num_of_cu; >>> int num_of_cache_entries = 0; >>> int cache_mem_filled = 0; >>> int ret = 0; >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> + struct amd_iommu_device_info iommu_info; >>> const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | >>> AMD_IOMMU_DEVICE_FLAG_PRI_SUP | >>> AMD_IOMMU_DEVICE_FLAG_PASID_SUP; >>> +#endif >>> struct kfd_local_mem_info local_mem_info; >>> >>> if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU) >>> @@ -1106,12 +1110,14 @@ static int kfd_create_vcrat_image_gpu(void >>> *pcrat_image, >>> /* Check if this node supports IOMMU. During parsing this flag will >>> * translate to HSA_CAP_ATS_PRESENT >>> */ >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> iommu_info.flags = 0; >>> if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) { >>> if ((iommu_info.flags & required_iommu_flags) == >>> required_iommu_flags) >>> cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT; >>> } >>> +#endif >>> >>> crat_table->length += sub_type_hdr->length; >>> crat_table->total_entries++; >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> index fafe971..5205b34 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c >>> @@ -20,7 +20,9 @@ >>> * OTHER DEALINGS IN THE SOFTWARE. >>> */ >>> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> #include <linux/amd-iommu.h> >>> +#endif >>> #include <linux/bsearch.h> >>> #include <linux/pci.h> >>> #include <linux/slab.h> >>> @@ -31,6 +33,7 @@ >>> >>> #define MQD_SIZE_ALIGNED 768 >>> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> static const struct kfd_device_info kaveri_device_info = { >>> .asic_family = CHIP_KAVERI, >>> .max_pasid_bits = 16, >>> @@ -41,6 +44,7 @@ static const struct kfd_device_info kaveri_device_info = { >>> .num_of_watch_points = 4, >>> .mqd_size_aligned = MQD_SIZE_ALIGNED, >>> .supports_cwsr = false, >>> + .needs_iommu_device = true, >>> .needs_pci_atomics = false, >>> }; >>> >>> @@ -54,8 +58,10 @@ static const struct kfd_device_info carrizo_device_info = >>> { >>> .num_of_watch_points = 4, >>> .mqd_size_aligned = MQD_SIZE_ALIGNED, >>> .supports_cwsr = true, >>> + .needs_iommu_device = true, >>> .needs_pci_atomics = false, >>> }; >>> +#endif >>> >>> struct kfd_deviceid { >>> unsigned short did; >>> @@ -64,6 +70,7 @@ struct kfd_deviceid { >>> >>> /* Please keep this sorted by increasing device id. */ >>> static const struct kfd_deviceid supported_devices[] = { >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> { 0x1304, &kaveri_device_info }, /* Kaveri */ >>> { 0x1305, &kaveri_device_info }, /* Kaveri */ >>> { 0x1306, &kaveri_device_info }, /* Kaveri */ >>> @@ -91,6 +98,7 @@ static const struct kfd_deviceid supported_devices[] = { >>> { 0x9875, &carrizo_device_info }, /* Carrizo */ >>> { 0x9876, &carrizo_device_info }, /* Carrizo */ >>> { 0x9877, &carrizo_device_info } /* Carrizo */ >>> +#endif >>> }; >>> >>> static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size, >>> @@ -161,6 +169,7 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, >>> return kfd; >>> } >>> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> static bool device_iommu_pasid_init(struct kfd_dev *kfd) >>> { >>> const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | >>> @@ -231,6 +240,7 @@ static int iommu_invalid_ppr_cb(struct pci_dev *pdev, >>> int pasid, >>> >>> return AMD_IOMMU_INV_PRI_RSP_INVALID; >>> } >>> +#endif /* CONFIG_AMD_IOMMU_V2 */ >>> >>> static void kfd_cwsr_init(struct kfd_dev *kfd) >>> { >>> @@ -321,12 +331,14 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, >>> goto device_queue_manager_error; >>> } >>> >>> - if (!device_iommu_pasid_init(kfd)) { >>> - dev_err(kfd_device, >>> - "Error initializing iommuv2 for device %x:%x\n", >>> - kfd->pdev->vendor, kfd->pdev->device); >>> - goto device_iommu_pasid_error; >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> + if (kfd->device_info->needs_iommu_device) { >>> + if (!device_iommu_pasid_init(kfd)) { >>> + dev_err(kfd_device, "Error initializing iommuv2\n"); >>> + goto device_iommu_pasid_error; >>> + } >>> } >>> +#endif >>> >>> kfd_cwsr_init(kfd); >>> >>> @@ -386,11 +398,16 @@ void kgd2kfd_suspend(struct kfd_dev *kfd) >>> >>> kfd->dqm->ops.stop(kfd->dqm); >>> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> + if (!kfd->device_info->needs_iommu_device) >>> + return; >>> + >>> kfd_unbind_processes_from_device(kfd); >>> >>> amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL); >>> amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL); >>> amd_iommu_free_device(kfd->pdev); >>> +#endif >>> } >>> >>> int kgd2kfd_resume(struct kfd_dev *kfd) >>> @@ -405,19 +422,24 @@ int kgd2kfd_resume(struct kfd_dev *kfd) >>> static int kfd_resume(struct kfd_dev *kfd) >>> { >>> int err = 0; >>> - unsigned int pasid_limit = kfd_get_pasid_limit(); >>> >>> - err = amd_iommu_init_device(kfd->pdev, pasid_limit); >>> - if (err) >>> - return -ENXIO; >>> - amd_iommu_set_invalidate_ctx_cb(kfd->pdev, >>> - iommu_pasid_shutdown_callback); >>> - amd_iommu_set_invalid_ppr_cb(kfd->pdev, >>> - iommu_invalid_ppr_cb); >>> - >>> - err = kfd_bind_processes_to_device(kfd); >>> - if (err) >>> - goto processes_bind_error; >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> + if (kfd->device_info->needs_iommu_device) { >>> + unsigned int pasid_limit = kfd_get_pasid_limit(); >>> + >>> + err = amd_iommu_init_device(kfd->pdev, pasid_limit); >>> + if (err) >>> + return -ENXIO; >>> + amd_iommu_set_invalidate_ctx_cb(kfd->pdev, >>> + >>> iommu_pasid_shutdown_callback); >>> + amd_iommu_set_invalid_ppr_cb(kfd->pdev, >>> + iommu_invalid_ppr_cb); >>> + >>> + err = kfd_bind_processes_to_device(kfd); >>> + if (err) >>> + goto processes_bind_error; >>> + } >>> +#endif >>> >>> err = kfd->dqm->ops.start(kfd->dqm); >>> if (err) { >>> @@ -431,8 +453,10 @@ static int kfd_resume(struct kfd_dev *kfd) >>> >>> dqm_start_error: >>> processes_bind_error: >>> - amd_iommu_free_device(kfd->pdev); >>> - >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> + if (kfd->device_info->needs_iommu_device) >>> + amd_iommu_free_device(kfd->pdev); >>> +#endif >>> return err; >>> } >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> index 93aae5c..f770dc7 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c >>> @@ -837,6 +837,7 @@ static void lookup_events_by_type_and_signal(struct >>> kfd_process *p, >>> } >>> } >>> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid, >>> unsigned long address, bool is_write_requested, >>> bool is_execute_requested) >>> @@ -905,6 +906,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, >>> unsigned int pasid, >>> mutex_unlock(&p->event_mutex); >>> kfd_unref_process(p); >>> } >>> +#endif /* CONFIG_AMD_IOMMU_V2_MODULE */ >>> >>> void kfd_signal_hw_exception_event(unsigned int pasid) >>> { >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> index eebfb1e..9f4766c 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> @@ -158,6 +158,7 @@ struct kfd_device_info { >>> uint8_t num_of_watch_points; >>> uint16_t mqd_size_aligned; >>> bool supports_cwsr; >>> + bool needs_iommu_device; >>> bool needs_pci_atomics; >>> }; >>> >>> @@ -617,9 +618,11 @@ void kfd_unref_process(struct kfd_process *p); >>> >>> struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev, >>> struct kfd_process *p); >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> int kfd_bind_processes_to_device(struct kfd_dev *dev); >>> void kfd_unbind_processes_from_device(struct kfd_dev *dev); >>> void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int >>> pasid); >>> +#endif >>> struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev, >>> struct kfd_process >>> *p); >>> struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev >>> *dev, >>> @@ -784,9 +787,11 @@ int kfd_wait_on_events(struct kfd_process *p, >>> uint32_t *wait_result); >>> void kfd_signal_event_interrupt(unsigned int pasid, uint32_t partial_id, >>> uint32_t valid_id_bits); >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> void kfd_signal_iommu_event(struct kfd_dev *dev, >>> unsigned int pasid, unsigned long address, >>> bool is_write_requested, bool is_execute_requested); >>> +#endif >>> void kfd_signal_hw_exception_event(unsigned int pasid); >>> int kfd_set_event(struct kfd_process *p, uint32_t event_id); >>> int kfd_reset_event(struct kfd_process *p, uint32_t event_id); >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> index a22fb071..1d0e02c 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> @@ -173,14 +173,17 @@ static void kfd_process_wq_release(struct work_struct >>> *work) >>> { >>> struct kfd_process *p = container_of(work, struct kfd_process, >>> release_work); >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> struct kfd_process_device *pdd; >>> >>> pr_debug("Releasing process (pasid %d) in workqueue\n", p->pasid); >>> >>> list_for_each_entry(pdd, &p->per_device_data, per_device_list) { >>> - if (pdd->bound == PDD_BOUND) >>> + if (pdd->bound == PDD_BOUND && >>> + pdd->dev->device_info->needs_iommu_device) >>> amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid); >>> } >>> +#endif >>> >>> kfd_process_destroy_pdds(p); >>> >>> @@ -421,7 +424,6 @@ struct kfd_process_device >>> *kfd_bind_process_to_device(struct kfd_dev *dev, >>> struct kfd_process >>> *p) >>> { >>> struct kfd_process_device *pdd; >>> - int err; >>> >>> pdd = kfd_get_process_device_data(dev, p); >>> if (!pdd) { >>> @@ -436,9 +438,14 @@ struct kfd_process_device >>> *kfd_bind_process_to_device(struct kfd_dev *dev, >>> return ERR_PTR(-EINVAL); >>> } >>> >>> - err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread); >>> - if (err < 0) >>> - return ERR_PTR(err); >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> + if (dev->device_info->needs_iommu_device) { >>> + int err = amd_iommu_bind_pasid(dev->pdev, p->pasid, >>> + p->lead_thread); >>> + if (err < 0) >>> + return ERR_PTR(err); >>> + } >>> +#endif >>> >>> pdd->bound = PDD_BOUND; >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>> index c6a7609..f57c305 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c >>> @@ -875,6 +875,7 @@ static void find_system_memory(const struct dmi_header >>> *dm, >>> */ >>> static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev) >>> { >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> struct kfd_perf_properties *props; >>> >>> if (amd_iommu_pc_supported()) { >>> @@ -886,6 +887,7 @@ static int kfd_add_perf_to_topology(struct >>> kfd_topology_device *kdev) >>> amd_iommu_pc_get_max_counters(0); /* assume one >>> iommu */ >>> list_add_tail(&props->list, &kdev->perf_props); >>> } >>> +#endif >>> >>> return 0; >>> } >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>> index 53fca1f..111fda2 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h >>> @@ -183,8 +183,10 @@ struct kfd_topology_device *kfd_create_topology_device( >>> struct list_head *device_list); >>> void kfd_release_topology_device_list(struct list_head *device_list); >>> >>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2) >>> extern bool amd_iommu_pc_supported(void); >>> extern u8 amd_iommu_pc_get_max_banks(u16 devid); >>> extern u8 amd_iommu_pc_get_max_counters(u16 devid); >>> +#endif >>> >>> #endif /* __KFD_TOPOLOGY_H__ */ >>> -- >>> 2.7.4 >>> >>> >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >>> >>> >