[PATCH 14/25] drm/amdkfd: Populate DRM render device minor

felix.kuehling@xxxxxxx (Felix Kuehling) · Tue, 13 Feb 2018 14:22:36 -0500

On 2018-02-13 01:45 PM, Christian KÃ¶nig wrote:
> Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
>> [SNIP]
>> Each process gets a whole page of the doorbell aperture assigned to it.
>> The assumption is that amdgpu only uses the first page of the doorbell
>> aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
>> used as the offset into the doorbell page. On GFX9 the hardware does
>> some engine-specific doorbell routing, so we added another layer of
>> doorbell management that's decoupled from the queue ID.
>>
>> Either way, an entire doorbell page gets mapped into user mode and user
>> mode knows the offset of the doorbells for specific queues. The mapping
>> is currently handled by kfd_mmap in kfd_chardev.c.
>
> Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost
> looks like you map different doorbells with the same offset depending
> on which process is calling this.
>
> Is that correct? If yes then that would be illegal and a problem if
> I'm not completely mistaken.

Why is that a problem. Each process has its own file descriptor. The
mapping is done using io_remap_pfn_range in kfd_doorbell_mmap. This is
nothing new. It's been done like this forever even on Kaveri and Carrizo.

>
>>> Do you simply assume that after evicting a process it always needs to
>>> be restarted without checking if it actually does something? Or how
>>> does that work?
>> Exactly.
>
> Ok, understood. Well that limits the usefulness of the whole eviction
> drastically.
>
>> With later addition of GPU self-dispatch a page-fault based
>> mechanism wouldn't work any more. We have to restart the queues blindly
>> with a timer. See evict_process_worker, which schedules the restore with
>> a delayed worker.
>> which was send either by the GPU o
>> The user mode queue ABI specifies that user mode update both the
>> doorbell and a WPTR in memory. When we restart queues we (or the CP
>> firmware) use the WPTR to make sure we catch up with any work that was
>> submitted while the queues were unmapped.
>
> Putting cross process work dispatch aside for a moment GPU
> self-dispatch works only when there is work on the GPU running.
>
> So you can still check if there are some work pending after you
> unmapped everything and only restart the queues when there is new work
> based on the page fault.
>
> In other words either there is work pending and it doesn't matter if
> it was send by the GPU or by the CPU or there is no work pending and
> we can delay restarting everything until there is.

That sounds like a useful optimization.

Regards,
Â  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>> Â Â  Felix
>>
>>> Regards,
>>> Christian.
>