Quoting Tvrtko Ursulin (2020-07-14 10:02:35) > > On 06/07/2020 07:19, Chris Wilson wrote: > > It is reasonably common for userspace (even modern drivers like iris) to > > reuse an active address for a new buffer. This would cause the > > application to stall under its mutex (originally struct_mutex) until the > > old batches were idle and it could synchronously remove the stale PTE. > > However, we can queue up a job that waits on the signal for the old > > nodes to complete and upon those signals, remove the old nodes replacing > > them with the new ones for the batch. This is still CPU driven, but in > > theory we can do the GTT patching from the GPU. The job itself has a > > completion signal allowing the execbuf to wait upon the rebinding, and > > also other observers to coordinate with the common VM activity. > > > > Letting userspace queue up more work, lets it do more stuff without > > blocking other clients. In turn, we take care not to let it too much > > concurrent work, creating a small number of queues for each context to > > limit the number of concurrent tasks. > > This is a monster patch.. what is the end result here? If there are a > few conflicts they can go async, but if more than "concurrency width" > need evict then it will be synchronous? No, if userspace gets far ahead of the binder, the tasks join the queue, and userspace continues on to complete the submission [without blocking, for as much as is possible, the waits I have not yet killed should only be hit if userspace exhausts the entire GTT for itself]. The throttling for userspace is indeeded to be on acquiring the timeline->mutex and checking for sufficient ring space, so right at the very start where we have the O_NONBLOCK check. > Could you do without this patch for the first implementation? Or come up > with ideas to split it up and so make understanding and review manageable? Challenging. The task is to hold vm->mutex for all the i915_vma_pin(), which entails breaking i915_vma_pin() into its steps [those before / during / after vm->mutex] The intention is really to use the lessons learnt here to generalise it again. > > +static int best_hole(struct drm_mm *mm, struct drm_mm_node *node, > > + u64 start, u64 end, u64 align) > > +{ > > + struct drm_mm_node *hole; > > + u64 size = node->size; > > + > > + do { > > + hole = __best_hole(mm, size); > > + if (!hole) > > + return -ENOSPC; > > + > > + node->start = round_up(max(start, drm_mm_hole_node_start(hole)), > > + align); > > + if (min(drm_mm_hole_node_end(hole), end) >= > > + node->start + node->size) > > + return drm_mm_reserve_node(mm, node); > > + > > + /* > > + * Too expensive to search for every single hole every time, > > + * so just look for the next bigger hole, introducing enough > > + * space for alignments. Finding the smallest hole with ideal > > + * alignment scales very poorly, so we choose to waste space > > + * if an alignment is forced. On the other hand, simply > > + * randomly selecting an offset in 48b space will cause us > > + * to use the majority of that space and exhaust all memory > > + * in storing the page directories. Compromise is required. > > + */ > > + size = hole->hole_size + align; > > + } while (1); > > +} > > evict_for_* and all above, feels like it is too much for > i915_gem_execbuffer.c. How about that goes to i915_gem_evict.c? Apart > from it depending on eb_vm_work.. Right. It's also a question of whether this should work as a part of the vm (gt/intel_gtt*) or as part of the user rules above that (gem/foo*) At the moment, execbuf should be the *only* place that cares about multiple PIN_USER and the risk of eviction interlocks. > Best hole at least operates solely on drm_mm so should go out, > presumably in preparation for moving into drm core. Yikes. The algorithm is very wasteful and doesn't do a good job of picking a good candidate, I wouldn't suggest this is suitable for wide use. It's just less bad than randomly picking (and so spreading evenly) over 48b. There was an attempt to do a fast drm_mm alignment search, but has not yet yielded something we can use. So for the time being a hack to avoid the O(N^2). > Hm, i915_gem_eb_vm.c for most of the rest? It's currently 25% of i915_gem_execbuffer.c, ~1000 / 3750 lines, for the entire chunk of eb_reserve_vm. About 500 for the search/evict/bind code itself. A sizeable chunk, just not sure where it really belongs. I do agree that it doesn't really belong here, in one huge file. > > +static void eb_vma_work_release(struct dma_fence_work *base) > > +{ > > + struct eb_vm_work *work = container_of(base, typeof(*work), base); > > + > > + __eb_bind_vma(work); > > Looks like on some early cancels this can try to cleanup things which > haven't been set up yet, like the dereference of work->bind[n]. I don't > see any protection against that. work->count = 0, then work->count = as_far_as_is_valid. > Maybe it would be enough if work->count was set after the work has been > set up. Right :-p > Why is a callback called eb_vma_work_release doing doing the binds > anyway? I'd expect release to clean up, not do the work. Because we have committed the changes in place to i915_vma and have no rollback. So if we cancel in the middle of those, the work so far must be committed. It's because static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); if (!f->dma.error) { int err; err = f->ops->work(f); if (err) dma_fence_set_error(&f->dma, err); } fence_complete(f); dma_fence_put(&f->dma); } so if we always call f->ops->work(), then the eb release can be sane, but I need to check the other users. I haven't made my mind up whether calling f->ops->work() when already in error is a good convention or not. > > +static int wait_for_unbinds(struct i915_execbuffer *eb, > > + struct list_head *unbound, > > + int pass) > > +{ > > + struct eb_vma *ev; > > + int err; > > + > > + list_for_each_entry(ev, unbound, unbound_link) { > > + struct i915_vma *vma = ev->vma; > > + > > + GEM_BUG_ON(ev->flags & __EXEC_OBJECT_HAS_PIN); > > + > > + if (drm_mm_node_allocated(&vma->node) && > > + eb_vma_misplaced(ev->exec, vma, ev->flags)) { > > + err = i915_vma_unbind(vma); > > + if (err) > > + return err; > > + } > > + > > + /* Wait for previous to avoid reusing vma->node */ > > + err = i915_vma_wait_for_unbind(vma); > > This waits on vma->active, presumably the current execbuf hasn't added > anything to it so far.. This is after the current execbuf has released everything it added, so the wait here will either be for its own incomplete error work, or for someone else. > > > + if (err) > > + return err; > > + } > > + > > + switch (pass) { > > + default: > > + return -ENOSPC; > > + > > + case 2: > > + /* > > + * Too fragmented, retire everything on the timeline and so > > + * make it all [contexts included] available to evict. > > + */ > > + err = wait_for_timeline(eb->context->timeline); > > + if (err) > > + return err; > > + > > + fallthrough; > > + case 1: > > + /* XXX ticket lock */ > > + if (i915_active_wait(&eb->context->vm->binding)) > > .. and this waits on itself? Or? For contemporary execbuf with inflight binders. If we fall off the greedy path where we aim to reuse our resident working set and only that set, we need to coordinate with everyone else also trying to use the vm so that we allocate the entire thing for ourselves should we need to. There's a thundering herd problem here in that we may never quiesce a shared vm to claim it all for us. Same old risks, and we can't do an extended hold of vm->mutex unless we are able to avoid all allocations after that point. And that's quite tricky. > > + /* No allocations allowed beyond this point */ > > + if (mutex_lock_interruptible(&vm->mutex)) > > + return eb_vm_work_cancel(work, -EINTR); > > + > > + err = eb_vm_throttle(work); > > The effect is installing an async await on the current binding work, > with the signaler being what is currently in vm->binding. This could be > another execbuf from the same client? Yes, it's keyed on the GEM context.async_id, so will only be queued behind execbufs from the same context. Of course, it will accumulate waits for the actual vma, which may be across clients, as well. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx