Quoting Christian König (2020-06-25 13:59:16) > Am 25.06.20 um 14:48 schrieb Chris Wilson: > > Quoting Christian König (2020-06-25 09:11:35) > >> Am 24.06.20 um 22:18 schrieb Chris Wilson: > >>> Quoting Dave Airlie (2020-06-24 20:04:02) > >>>> On Wed, 24 Jun 2020 at 07:19, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > >>>>> Quoting Dave Airlie (2020-06-23 22:01:24) > >>>>>> On Tue, 23 Jun 2020 at 20:03, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > >>>>>>> Quoting Thomas Hellström (Intel) (2020-06-23 10:33:20) > >>>>>>>> Hi, Chris! > >>>>>>>> > >>>>>>>> On 6/22/20 11:59 AM, Chris Wilson wrote: > >>>>>>>>> In order to actually handle eviction and what not, we need to process > >>>>>>>>> all the objects together under a common lock, reservation_ww_class. As > >>>>>>>>> such, do a memory reservation pass after looking up the object/vma, > >>>>>>>>> which then feeds into the rest of execbuf [relocation, cmdparsing, > >>>>>>>>> flushing and ofc execution]. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>>>>>> --- > >>>>>>>>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 91 ++++++++++++++----- > >>>>>>>>> 1 file changed, 70 insertions(+), 21 deletions(-) > >>>>>>>>> > >>>>>>>> Which tree is this against? The series doesn't apply cleanly against > >>>>>>>> drm-tip? > >>>>>>> It's continuing on from the scheduler patches, the bug fixes and the > >>>>>>> iris-deferred-fence work. I thought throwing all of those old patches > >>>>>>> into the pile would have been distracting. > >>>>>>> > >>>>>>>> ... > >>>>>>>> > >>>>>>>>> +static int eb_reserve_mm(struct i915_execbuffer *eb) > >>>>>>>>> +{ > >>>>>>>>> + const u64 idx = eb->context->timeline->fence_context; > >>>>>>>>> + struct ww_acquire_ctx acquire; > >>>>>>>>> + struct eb_vma *ev; > >>>>>>>>> + int err; > >>>>>>>>> + > >>>>>>>>> + eb->mm_fence = __dma_fence_create_proxy(0, 0); > >>>>>>>>> + if (!eb->mm_fence) > >>>>>>>>> + return -ENOMEM; > >>>>>>>> Where are the proxy fence functions defined? > >>>>>>> In dma-fence-proxy.c ;) > >>>>>> The dma-fence-proxy that Christian NAKed before? > >>>>> I do not have an email from Christian about dma-fence-proxy in the last > >>>>> 3 years it has been on the list. > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2Faeb0373d-0583-d922-3b73-93668c27d177%40amd.com%2F&data=02%7C01%7Cchristian.koenig%40amd.com%7Ccb060e358d844784815708d819061868%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286861292346372&sdata=DlHistmqPi%2BtwdcT%2FycrtRpoLGZ6xcBD%2FkPvVZcQ2YQ%3D&reserved=0 > >>> Darn, I skimmed the thread title and thought it was just about the > >>> timelines. > >>> > >>>> I'm assuming this was about patch 8 there which to me looks like proxy > >>>> fences but maybe by threading is off reading that. > >>> The deadlocks are easy to resolve. The fence is either signaled normally > >>> by userspace, they create a deadlock that is rejected by checking the dag > >>> and the fence signaled with an error (and work cancelled, error > >>> propagated back to userspace if they kept the output fence around), or > >>> userspace forgets entirely about the fence they were waiting on in which > >>> case it is signaled by closing the syncobjs [sadly not in error though, > >>> I hoping to report EPIPE] on process termination. > >> And exactly that concept is still a big NAK. > >> > >> The kernel memory management depends on dma_fences to be signaling as > >> soon as they are existing. > >> > >> Just imagine what Daniel's dependency patches would splat out when you > >> do something like this and correctly annotate the signaling code path. > > Nothing at all. Forward progress of the waiter does not solely depend on > > the signaler, just as in bc9c80fe01a2570a2fd78abbc492b377b5fda068. > > > >> Proxy fences, especially when they depend on userspace for signaling are > >> an absolutely NO-GO. > > We are in full control of the signaling and are able to cancel the pending > > userspace operation, move it off to one side and shutdown the HW, > > whatever. We can and do do dependency analysis of the fence contexts to > > avoid deadlocks, just as easily as detecting recursion. > > > > To claim that userspace is not already able to control signaling, is a > > false dichotomy. Userspace is fully able to lock the HW resources > > indefinitely (even if you cap every job, one can always build a chain of > > jobs to circumvent any imposed timeout, a couple of seconds timeout > > becomes several months of jobs before the GPU runs out of memory and is > > unable to accept any more jobs). Any ioctl that blocks while holding a HW > > resource renders itself liable to a user controllable livelock, you know > > this, because it is blocking the signaling of those earlier jobs. > > Worrying about things that are entirely within our control and hence > > avoidable, misses the point. > > You are completely missing the problem here. > > As you correctly pointed out that an userspace thread blocks on > something is perfectly acceptable. And that's how > bc9c80fe01a2570a2fd78abbc492b377b5fda068 works as well. > > And bc9c80fe01a2570a2fd78abbc492b377b5fda068 only implements waiting so > that during CS or WAIT IOCTL we can block for the fence to appear. > > > What happens in your approach is that the kernel starts to wait for > userspace in its memory reclaim path. That is exactly the kind of > problem Daniels patches now point out immediately. No we don't. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx