Re: HMM fence (was Re: [PATCH 00/35] Add HMM-based SVM memory manager to KFD)

Christian König <christian.koenig@xxxxxxx> · Fri, 15 Jan 2021 08:47:49 +0100

Am 14.01.21 um 22:13 schrieb Felix Kuehling:
Am 2021-01-14 um 11:51 a.m. schrieb Jerome Glisse:
On Thu, Jan 14, 2021 at 02:37:36PM +0100, Christian KÃ¶nig wrote:
Am 14.01.21 um 12:52 schrieb Daniel Vetter:
[SNIP]
I had a new idea, i wanted to think more about it but have not yet,
anyway here it is. Adding a new callback to dma fence which ask the
question can it dead lock ? Any time a GPU driver has pending page
fault (ie something calling into the mm) it answer yes, otherwise
no. The GPU shrinker would ask the question before waiting on any
dma-fence and back of if it gets yes. Shrinker can still try many
dma buf object for which it does not get a yes on associated fence.

This does not solve the mmu notifier case, for this you would just
invalidate the gem userptr object (with a flag but not releasing the
page refcount) but you would not wait for the GPU (ie no dma fence
wait in that code path anymore). The userptr API never really made
the contract that it will always be in sync with the mm view of the
world so if different page get remapped to same virtual address
while GPU is still working with the old pages it should not be an
issue (it would not be in our usage of userptr for compositor and
what not).
The current working idea in my mind goes into a similar direction.

But instead of a callback I'm adding a complete new class of HMM fences.

Waiting in the MMU notfier, scheduler, TTM etc etc is only allowed for
the dma_fences and HMM fences are ignored in container objects.

When you handle an implicit or explicit synchronization request from
userspace you need to block for HMM fences to complete before taking any
resource locks.
Isnt' that what I call gang scheduling? I.e. you either run in HMM
mode, or in legacy fencing mode (whether implicit or explicit doesn't
really matter I think). By forcing that split we avoid the problem,
but it means occasionally full stalls on mixed workloads.

But that's not what Jerome wants (afaiui at least), I think his idea
is to track the reverse dependencies of all the fences floating
around, and then skip evicting an object if you have to wait for any
fence that is problematic for the current calling context. And I don't
think that's very feasible in practice.

So what kind of hmm fences do you have in mind here?
It's a bit more relaxed than your gang schedule.

See the requirements are as follow:

1. dma_fences never depend on hmm_fences.
2. hmm_fences can never preempt dma_fences.
3. dma_fences must be able to preempt hmm_fences or we always reserve enough
hardware resources (CUs) to guarantee forward progress of dma_fences.

Critical sections are MMU notifiers, page faults, GPU schedulers and
dma_reservation object locks.

4. It is valid to wait for a dma_fences in critical sections.
5. It is not valid to wait for hmm_fences in critical sections.

Fence creation either happens during command submission or by adding
something like a barrier or signal command to your userspace queue.

6. If we have an hmm_fence as implicit or explicit dependency for creating a
dma_fence we must wait for that before taking any locks or reserving
resources.
7. If we have a dma_fence as implicit or explicit dependency for creating an
hmm_fence we can wait later on. So busy waiting or special WAIT hardware
commands are valid.

This prevents hard cuts, e.g. can mix hmm_fences and dma_fences at the same
time on the hardware.

In other words we can have a high priority gfx queue running jobs based on
dma_fences and a low priority compute queue running jobs based on
hmm_fences.

Only when we switch from hmm_fence to dma_fence we need to block the
submission until all the necessary resources (both memory as well as CUs)
are available.

This is somewhat an extension to your gang submit idea.
What is hmm_fence ? You should not have fence with hmm at all.
So i am kind of scare now.
I kind of had the same question trying to follow Christian and Daniel's
discussion. I think an HMM fence would be any fence resulting from the
completion of a user mode operation in a context with HMM-based memory
management that may stall indefinitely due to page faults.

It was more of a placeholder for something which can be used for inter 
process synchronization.

But on a hardware engine that cannot preempt page-faulting work and has
not reserved resources to guarantee forward progress for kernel jobs, I
think all fences will need to be HMM fences, because any work submitted
to such an engine can stall by getting stuck behind a stalled user mode
operation.

So for example, you have a DMA engine that can preempt during page
faults, but a graphics engine that cannot. Then work submitted to the
DMA engine can use dma_fence. But work submitted to the graphics engine
must use hmm_fence. To avoid deadlocks, dma_fences must never depend on
hmm_fences and resolution of page faults must never depend on hmm_fences.

Yeah, it's a bit more complicated but in general that fits.

Regards,
Christian.

Regards,
   Felix

Cheers,
JÃ©rÃ´me

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel