Re: [PATCH 7/7] drm/i915/gem: Acquire all vma/objects under reservation_ww_class

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Fri, 26 Jun 2020 12:10:38 +0100

Quoting Christian König (2020-06-26 09:54:19)
> Am 26.06.20 um 10:10 schrieb Chris Wilson:
> > Quoting Chris Wilson (2020-06-25 18:42:41)
> >> Quoting Christian König (2020-06-25 16:47:09)
> >>> Am 25.06.20 um 17:10 schrieb Chris Wilson:
> >>>> We have the DAG of fences, we can use that information to avoid adding
> >>>> an implicit coupling between execution contexts.
> >>> No, we can't. And it sounds like you still have not understood the
> >>> underlying problem.
> >>>
> >>> See this has nothing to do with the fences itself or their DAG.
> >>>
> >>> When you depend on userspace to do another submission so your fence can
> >>> start processing you end up depending on whatever userspace does.
> >> HW dependency on userspace is explicit in the ABI and client APIs, and
> >> the direct control userspace has over the HW.
> >>
> >>> This in turn means when userspace calls a system call (or does page
> >>> fault) it is possible that this ends up in the reclaim code path.
> >> We have both said the very same thing.
> 
> Then I'm really wondering why you don't come to the same conclusion :)
> 
> >>   
> >>> And while we want to avoid it both Daniel and I already discussed this
> >>> multiple times and we agree it is still a must have to be able to do
> >>> fence waits in the reclaim code path.
> >> But came to the opposite conclusion. For doing that wait harms the
> >> unrelated caller and the reclaim is opportunistic. There is no need for
> >> that caller to reclaim that page, when it can have any other. Why did you
> >> even choose that page to reclaim? Inducing latency in the caller is a bug,
> >> has been reported previously as a bug, and still considered a bug. [But at
> >> the end of the day, if the system is out of memory, then you have to pick
> >> a victim.]
> 
> Correct. But this is also not limited to the reclaim path as any kernel 
> system call and page fault can cause a problem as well.

Yes. Hence the effort to avoid blocking and implicit waits in those paths,
and why flagging those waits is better than accepting them. The necessary
evil should be annotated, everything that is unnecessary should be
avoided.

And that it is the user->kernel entry points that are important as they
are uncontrolled; but directly nesting execution contexts is controlled.

And yes direct reclaim is the easiest and most obvious case to avoid
unbounded waits inside unknown contexts.

> In other words "fence -> userspace -> page fault -> fence" or "fence -> 
> userspace -> system call -> fence" can easily cause the same problem and 
> that is not avoidable.
> 
> > An example
> >
> > Thread A                              Thread B
> >
> >       submit(VkCmdWaitEvents)
> >       recvfrom(ThreadB)       ...     sendto(ThreadB)
> >                                       \- alloc_page
> >                                        \- direct reclaim
> >                                         \- dma_fence_wait(A)
> >       VkSetEvent()
> >
> > Regardless of that actual deadlock, waiting on an arbitrary fence incurs
> > an unbounded latency which is unacceptable for direct reclaim.
> >
> > Online debugging can indefinitely suspend fence signaling, and the only
> > guarantee we make of forward progress, in some cases, is process
> > termination.
> 
> And exactly that is what doesn't work. You don't have any forward 
> progress any more because you ran into a software deadlock.

Only one side is halted. Everything on that side comes to a grinding
halt.

What about checkpoint/restore, suspend/resume? Where we need to suspend
all execution, move all the resources to one side, then put everything
back, without cancelling the fences. Same halting problem, no?

We also do similar for resets. Suspend the hanging context, move it and
all dependent execution off to one side; record what we can, clean up
what we have to, then move what remains of the execution back to finish
signaling.

> In other words the signaling of a fence depends on the welfare of 
> userspace. You can try to kill userspace, but this can wait for the 
> fence you try to signal in the first place.

The only scenario that fits what you are describing here [userspace
ignoring a signal] is if you used an uninterruptible wait. Under what
circumstances during normal execution would you do that? If it's
someone else's wait, a bug outside of our control.

But if you have chosen to cancel the fences, there is nothing to stop
the signaling.

> See the difference to a deadlock on the GPU is that you can can always 
> kill a running job or process even if it is stuck with something else. 
> But if the kernel is deadlocked with itself you can't kill the process 
> any more, the only option left to get cleanly out of this is to reboot 
> the kernel.

However, I say that is under our control. We know what fences are in an
execution context, just as easily as we know that we are inside an
execution context. And yes, the easiest, the most restrictive way to
control it is to say don't bother.

> The only way to avoid this would be to never ever wait for the fence in 
> the kernel and then your whole construct is not useful any more.

I advocate for moving as much as is feasible, for some waits are required
by userspace as a necessary evil, into the parallelised pipeline.

> I'm running out of ideas how to explain what the problem is here....

Oh we agree on the problem, we appear to disagree that the implicit waits
themselves are a serious existent problem. That they are worth effort to
avoid or, at least, mitigate.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx