On Wed, Jun 08, 2016 at 01:44:27PM +0100, Tvrtko Ursulin wrote: > > On 08/06/16 13:34, Chris Wilson wrote: > >On Wed, Jun 08, 2016 at 12:47:28PM +0100, Tvrtko Ursulin wrote: > >> > >>On 08/06/16 12:24, Chris Wilson wrote: > >>>On Wed, Jun 08, 2016 at 11:16:13AM +0100, Tvrtko Ursulin wrote: > >>>> > >>>>And why so long delays? It looks pretty lightweight to me. > >>>> > >>>>>>One alternative could perhaps be to add a waiter->wake_up vfunc and > >>>>>>signalers could then potentially use a tasklet? > >>>>> > >>>>>Hmm, I did find that in order to reduce execlists latency, I had to > >>>>>drive the tasklet processing from the signaler. > >>>> > >>>>What do you mean? The existing execlists tasklet? Now would that work? > >>> > >>>Due to how dma-fence signals, the softirq is never kicked > >>>(spin_lock_irq doesn't handle local_bh_enable()) and so we would only > >>>submit a new task via execlists on a reschedule. That latency added > >>>about 30% (30s on bsw) to gem_exec_parallel. > >> > >>I don't follow. User interrupts are separate from context complete > >>which drives the submission. How do fences interfere with the > >>latter? > > > >The biggest user benchmark (ala sysmark) regression we have for > >execlists is the latency in submitting the first request to hardware via > >elsp (or at least the hw responding to and executing that batch, > >the per-bb and per-ctx w/a are not free either). If we incur extra > >latency in the driver in even adding the request to the queue for an > >idle GPU, that is easily felt by userspace. > > I still don't see how fences tie into that. But it is not so > important since it was all along the lines of "do we really need a > thread". I was just mentioning in passing an issue I noticed when mixing fences and tasklets! Which boils down to spin_unlock_irq() doesn't do local_bh_enable() and so trying to schedule a tasklet from inside a fence callback incurs more latency than you would expect. Entirely unrelated expect for the signaling, fencing and their uses ;) > >>>>>>>+int intel_engine_enable_signaling(struct drm_i915_gem_request *request) > >>>>>>>+{ > >>>>>>>+ struct intel_engine_cs *engine = request->engine; > >>>>>>>+ struct intel_breadcrumbs *b = &engine->breadcrumbs; > >>>>>>>+ struct rb_node *parent, **p; > >>>>>>>+ struct signal *signal; > >>>>>>>+ bool first, wakeup; > >>>>>>>+ > >>>>>>>+ if (unlikely(IS_ERR(b->signaler))) > >>>>>>>+ return PTR_ERR(b->signaler); > >>>>>> > >>>>>>I don't see that there is a fallback is kthread creation failed. It > >>>>>>should just fail in intel_engine_init_breadcrumbs if that happens. > >>>>> > >>>>>Because it is not fatal to using the GPU, just one optional function. > >>>> > >>>>But we never expect it to fail and it is not even dependent on > >>>>anything user controllable. Just a random error which would cause > >>>>user experience to degrade. If thread creation failed it means > >>>>system is in such a poor shape I would just fail the driver init. > >>> > >>>A minimally functional system is better than nothing at all. > >>>GEM is not required for driver loading, interrupt driven dma-fences less > >>>so. > >> > >>If you are so hot for that, how about vfuncing enable signaling in > >>that case? Because I find the "have we created our kthread at driver > >>init time successfuly" question for every fence a bit too much. > > > >read + conditional that pulls in the cacheline we want? You can place > >the test after the spinlock if you want to avoid the cost I supose. > >Or we just mark the GPU as wedged. > > What I meant was to pass in different fence_ops at fence_init time > depending on whether or not signaler thread was created or not. If > driver is wanted to be functional in that case, and > fence->enable_signaling needs to keep returning errors, it sound > like a much more elegant solution than to repeating the check at > every fence->enable_signaling call. Actually, looking at it, the code was broken for !thread as there was not an automatic fallback to polling by dma-fence. Choice between doing that ourselves for an impossible failure case or just marking the GPU as wedged on init. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx