This was to fix application long event wait latency, when app
shader generates lots of event interrupts in short period, the
scheduled work no chance to execute on the same CPU core, this
causes event cannot post/return to app thread which are waiting
the event. To schedule work on the core of same NUMA node is to
optimize cache usage in general.
Regards
Philip
On 2023-06-27 11:42, Alex Deucher
wrote:
+Felix, Philip On Tue, Jun 27, 2023 at 4:42 AM Philipp Stanner <pstanner@xxxxxxxxxx> wrote:Hello folks, I'm currently trying to learn more about DRM and discovered the following code sequence: drivers/gpu/drm/amd/amdkfd/kfd_device.c, Line 824 on 6.4-rc7 static inline void kfd_queue_work(struct workqueue_struct *wq, struct work_struct *work) { int cpu, new_cpu; cpu = new_cpu = smp_processor_id(); do { new_cpu = cpumask_next(new_cpu, cpu_online_mask) % nr_cpu_ids; if (cpu_to_node(new_cpu) == numa_node_id()) break; } while (cpu != new_cpu); queue_work_on(new_cpu, wq, work); } /* This is called directly from KGD at ISR. */ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry) { uint32_t patched_ihre[KFD_MAX_RING_ENTRY_SIZE]; bool is_patched = false; unsigned long flags; if (!kfd->init_complete) return; if (kfd->device_info.ih_ring_entry_size > sizeof(patched_ihre)) { dev_err_once(kfd_device, "Ring entry too small\n"); return; } spin_lock_irqsave(&kfd->interrupt_lock, flags); if (kfd->interrupts_active && interrupt_is_wanted(kfd, ih_ring_entry, patched_ihre, &is_patched) && enqueue_ih_ring_entry(kfd, is_patched ? patched_ihre : ih_ring_entry)) kfd_queue_work(kfd->ih_wq, &kfd->interrupt_work); spin_unlock_irqrestore(&kfd->interrupt_lock, flags); } These functions seem to be exclusively invoked by amdgpu_irq_dispatch() in amdgpu_irq.c At first glance it seems to me that it's just a typical scenario taking place here: Interrupt arises, interrupt submits work to wq, then jumps back to sleep / former process execution context again. What I don't understand is why it's apparently important to schedule the work on a particular CPU. It seems that the do-while in kfd_queue_work() is searching for a CPU within the same NUMA-Node. Thus I suspect that this is done because either a) performance requires it or b) the work-function needs access to something that's only available within the same node. I suspect there is an interrupt-related reason why that particular work should be enqueued on a specific CPU. Just by reading the code alone I can't really figure out why precisely that's necessary, though. Does someone have any hints for me? :) Cheers, Philipp