On Wed, 2018-08-01 at 15:22 +0100, Chris Wilson wrote: > Quoting Jakub Bartmiński (2018-08-01 14:56:11) > > [...] > > + /* We have already tried preempting, but the hardware did > > not react */ > > + if (engine->hangcheck.try_preempt) > > + return false; > > + > > + active_request = i915_gem_find_active_request(engine); > > No ownership here of rq. You either need a new function to return a > reference, or careful application of rcu. This is a fair point, I'm assuming without taking ownership here we may end up with a nasty race? > > [...] > > + /* > > + * If the workload is preemptible but its context > > was closed > > + * we force the engine to skip its execution > > instead. > > + */ > > + if (i915_gem_context_is_closed(active_context)) > > + skip_seqno = true; > > Nope. Even if the context is closed, its results may be depended upon > by others; both GPU and userspace. While there may be dependencies, we currently break these dependencies anyway by resetting the entire engine, don't we? Couldn't we signal an error for the skipped requests as if we reseted the engine and continue on with the execution of the remaining requests? If we don't reset/skip we are left with a request that potentially spins forever, with blocked dependent requests, and since it's basically the halting problem we have to draw the line somewhere. > [...] > So the gist of this is to excuse lowest priority user contexts from > hangcheck, do we want to go one step further and ask userspace to opt > in explicitly? Making it explicit opt-in was the idea here, doing it by setting the lowest priority was just a rough way of accomplishing that. > Though the argument that as long as its preemptible (and isolated) it > is not acting as DoS so we don't need to enforce a timeout. That also > suggests that we need not reset until the request is blocking others > or userspace. So if I'm understanding correctly you are suggesting that if we have preemption enabled we could drop the hangcheck entirely, and just reset if preemption fails? > As discussed that seems reasonable, but I think this does need to > actually trigger preemption. > -Chris I'm not sure what you mean right here by "actually trigger preemption", since this patch does trigger the preemption, could you clarify? As I've mentioned before, one other problem with this patch is that we can't have more than one "background" request working before the first one finishes. While this could be solved at hangcheck by putting the background requests in the back of the background-priority queue instead of at the front during preemption, this breaks if the background requests are dependent on each other. These dependencies could be resolved in the tasklet (only in the case hangcheck was called so it wouldn't slow us down too much, they could also be resolved in the hangcheck itself but the timeline could change in between the hangcheck and the tasklet, which would make doing it correctly more complicated I think). - Jakub
Attachment:
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx