Re: [PATCH 48/55] drm/i915: Add *_ring_begin() to request allocation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17/06/2015 16:52, Chris Wilson wrote:
On Wed, Jun 17, 2015 at 04:54:42PM +0200, Daniel Vetter wrote:
On Wed, Jun 17, 2015 at 03:27:08PM +0100, Chris Wilson wrote:
On Wed, Jun 17, 2015 at 03:31:59PM +0200, Daniel Vetter wrote:
On Fri, May 29, 2015 at 05:44:09PM +0100, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

Now that the *_ring_begin() functions no longer call the request allocation
code, it is finally safe for the request allocation code to call *_ring_begin().
This is important to guarantee that the space reserved for the subsequent
i915_add_request() call does actually get reserved.

v2: Renamed functions according to review feedback (Tomas Elf).

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
Still has my question open from the previos round:

http://mid.gmane.org/20150323091030.GL1349@phenom.ffwll.local

Note that this isn't all that unlikely with GuC mode since there the
ringbuffer is substantially smaller (due to firmware limitations) than
what we allocate ourselves right now.
Looking at this patch, I am still fundamentally opposed to reserving
space for the request. Detecting a request that wraps and cancelling
that request (after the appropriate WARN for the overlow) is trivial and
such a rare case (as it is a programming error) that it should only be
handled in the slow path.
I thought the entire point here that we don't have request half-committed
because the final request ringcmds didn't fit in. And that does require
that we reserve a bit of space for that postamble.

I guess if it's too much (atm it's super-pessimistic due to ilk) we can
make per-platform reservation limits to be really minimal.

Maybe we could go towards a rollback model longterm of rewingind the
ringbuffer. But if there's no clear need I'd like to avoid that
complexity.
Even if you didn't like the rollback model which helps handling the
partial state from context switches and what not, if you run out of
ringspace you can set the GPU as wedged. Issuing a request that fills
the entire ringbuffer is a programming bug that needs to be caught very
early in development.
-Chris


I'm still confused by what you are saying in the above referenced email. Part of it is about the sanity checks failing to handle the wrapping case correctly which has been fixed in the base reserve space patch (patch 2 in the series). The rest is either saying that you think we are potentially wrappping too early and wasting a few bytes of the ring buffer or that something is actually broken?

Point 2: 100 bytes of reserve, 160 bytes of execbuf and 200 bytes remaining. You seem to think this will fail somehow? Why? The wait_for_space(160) in the execbuf code will cause a wrap because the the 100 bytes for the add_request reservation is added on and the wait is actually being done for 260 bytes. So yes, we wrap earlier than would otherwise have been necessary but that is the only way to absolutely guarantee that the add_request() call cannot fail when trying to do the wrap itself.

As Chris says, if the driver is attempting to create a single request that fills the entire ringbuffer then that is a bug that should be caught as soon as possible. Even with a Guc, the ring buffer is not small compared to the size of requests the driver currently produces. Part of the scheduler work is to limit the number of batch buffers that a given application/context can have outstanding in the ring buffer at any given time in order to prevent starvation of the rest of the system by one badly behaved app. Thus completely filling a large ring buffer becomes impossible anyway - the application will be blocked before it gets that far.

Note that with the removal of the OLR, all requests now have a definite start and a definite end. Thus the scheme could be extended to provide rollback of the ring buffer. Each new request takes a note of the ring pointers at creation time. If the request is cancelled it can reset the pointers to where they were before. Thus all half submitted work is discarded. That is a much bigger semantic change however, so I would really like to get the bare minimum anti-OLR patch set in first before trying to do fancy extra features.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux