Re: [PATCH 05/51] drm/i915: Add return code check to i915_gem_execbuffer_retire_commands()

John Harrison <John.C.Harrison@xxxxxxxxx> · Thu, 05 Mar 2015 15:06:42 +0000

On 05/03/2015 14:44, Daniel Vetter wrote:
On Thu, Mar 05, 2015 at 01:06:10PM +0000, John Harrison wrote:
On 26/02/2015 02:26, Daniel Vetter wrote:
On Wed, Feb 25, 2015 at 11:17:00PM +0100, Daniel Vetter wrote:
On Fri, Feb 13, 2015 at 11:48:14AM +0000, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

For some reason, the i915_add_request() call in
i915_gem_execbuffer_retire_commands() was explicitly having its return code
ignored. The _retire_commands() function itself was 'void'. Given that
_add_request() can fail without dispatching the batch buffer, this seems odd.
I was so convinced we've had a commit somewhere explaining this, but
apparently not.

The deal is that after the dispatch call we have the batch commit and
there's no going back any more, which also means we can't return an error
code from the ioctl. So if you return -EIO or -ENOMEM that's a pretty bad
lie and you really have to ignore that error code.

Again I've tried to dig up the commit for that but that was lost in the
maze of the past 5 years of changes. We've had piles of older approaches
to deal with this issue:
- Don't even emit a request, just mark objects as gpu dirty. Only when
   waiting did we emit flushes and requests, which again again gave us a
   context to return the error. This resulted in horrible latency since
   flushes where wait too late and also all that book-keeping was not worth
   it at all. Don't ask ;-)
- Emit flushes right away, but if we fail to alloc the request set the
   outstanding lazy request bit. The job of the check_olr function used in
   waits was to notice that and retry the allocation.
- Preallocate the request, but that still leaves the possibility that the
   gpu dies. But since we've committed hangcheck will clean this up and we
   can just ignore the -EIO.

Given all that backstory: Why does add_request/retire_commands suddenly
need to fail?
The problem is that if add_request() fails and the request is not added to
ring->request_list then it will be lost. As soon as the execbuff code
returns, there is no longer a request pointer floating around so it can
can't have add_request() called on it later. Thus the request will never be
retired, the objects, context, etc never dereferenced, and basically lots of
stuff will be leaked. Without the OLR to hoover up the failures, the
add_request() call really must not be allowed to give up.
That's exactly what I mean, add_request can't fail. The other issue is
that you can't fail execbuf at the point you call add_request either any
more since we've already (at least potentially) started executing the
batch.

It's actually worse since it's not just -EIO but also -EINTR, returned by
intel_ring_begin when we're thrashing the gpu a bit too badly with
requests. Which means we really need to guarantee that the request is
completed properly, eventually since it's not just for fatal gpu hangs.

Atm that's done by only clearing outstanding_lazy_request after we've
really emitted the request fully. That guarantees that even when parts of
the request emission to the ringbuf fails we'll retry on the next wait if
needed.

A possible fix to make this infallible would be to reserve some fixed
amount of ringbuf credit at request creation time and then consume it
here. Of course we'd need checks to make sure we never use more ringspace
than what we reserve. To avoid massive churn we could convert
I915_RING_FREE_SPACE into a variable and increase it enough when
allocating the request. And then reduce it again at the start of
add_request.
-Daniel
I don't think you can guarantee to reserve enough space at request creation
time. You have no idea how much space will be required by what ever piece of
code is wanting the request. It could be a few words or it might be reams
and reams of workaround goo. One of the scheduler patches does improve this
and do a 'large enough' ring_begin() at the start of the execbuffer
submission path in order to prevent out of space issues and other such
problems half way through that could lead to a partial submission. However,
even that is not absoluetely guaranteed 100% failure proof.

How about changing add_request() so that it can't fail. As in, the cache
flush call and the emit request call can still failure due to running out of
ring space, but add_request() just ignores that and keeps going anyway. That
way the request is still correctly tracked and will be retired eventually.
The only issues are unflushed caches and no seqno interrupt being generated.
However, if the assumption is that another request will be submitted shortly
(which is extremely likely if the system is busy enough to cause a failure
during add_request!) then this will be fine. The following request will
flush the caches and write the next seqno along to the ringbuffer. When that
pops out, both the broken request and the new one will be considered
complete and can be retired. The only issue is if the broken request is that
last one to be submitted and is then waited on. In that case, you will get a
timeout/hang as the request will never complete. Although that could be
worked around by setting a 'failed request' flag in the ring and having the
wait code (or even the currently redundant check_olr function) look at that
and attempt a brand new (but empty) request submission.

Or maybe a simpler solution is to just keep a 'last failed request' pointer
in the ring. Sort of a not-quite-OLR. If add_request() fails, it saves the
request pointer here instead of adding it to the request list. A subsequent
request allocation call starts by checking the 'last failed' value and
retries the add_request() call if present. At that point it is allowed to
fail. I guess it still needs to be done by check_olr as well to prevent a
wait from stalling if no other requests are submitted.
Imo reserving a bit of ring space for each add_request should be solid.
Userspace uses the exact same reservation logic for adding end-of-batch
workarounds. The only thing needed to make this solid is to WARN if
add_request ends up using more ring space than what we've reserved (not
just when it actually runs out, that obviously doesn't happen often
enough for testing).
The problem is that there could be multiple requests being processed in 
parallel. This is especially true with the scheduler. Userland could 
submit a whole stream of batches that all get queued up in the 
scheduler. Only later do they get submitted to the hardware. The request 
must be allocated up front because there is no other means of tracking 
them. But reserving space at that point won't work because you either 
end up reserving massive amounts of space if the reserve is cumulative, 
or not enough if only one slot is reserved.

Everything else just readds olr through the backdoor, which is kinda what
we wanted to avoid from an accounting pov. Because then you have again
some random request outstanding which scoops up everything it encounters.
Not quite.  The difference is that with something like an outstanding 
failed request rather than a lazy one, there is still the segregation of 
work. The failed request will be posted and added to the request list in 
its entirety before a new request is allocated and used for the new work.

For the ringbuf interface itself I think we only need 3 pieces:
- ringbuf_reserve(space): Ensures there's @space available in the ring and
   then sets that as ring->reserved_space. ring_free_space needs to
   subtract that ofc. This would be in the alloc_request function.

- rinbuf_use_reserve(): sets ring->reserve_space to 0 so that
   intel_ring_begin can start eating into reserves. This would be at the
   top of add_request.

- ringbuf_check_reserve(): This would be called at the end of add_request
   and WARNs if we've used more ring space than what we've promised.
   Obviously needs some boo-keeping between use and check but that's just a
   detail. If you go with a flag instead of clearing ->reserve_space you
   can even enforce that these three functions are always called in this
   order and don't end up being nested wrongly.

Cheers, Daniel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx