Re: [PATCH v6 00/34] GPU scheduler for i915 driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 25/04/2016 10:54, Chris Wilson wrote:
On Wed, Apr 20, 2016 at 06:13:18PM +0100, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

Implemented a batch buffer submission scheduler for the i915 DRM driver.

The general theory of operation is that when batch buffers are
submitted to the driver, the execbuffer() code assigns a unique seqno
value and then packages up all the information required to execute the
batch buffer at a later time. This package is given over to the
scheduler which adds it to an internal node list.
That is called the request. Please use it properly. We pass a request to
the engine. The engine can hook up the scheduler in which case it may
defer execution of that request until the scheduler says it is ready.
But that is a conversation inside the engine, not intermingled with GEM.
It looks like I forgot to update that part of the description. The scheduler and its cover letter were written before the conversion of the driver from using seqnos everywhere to using requests. Indeed, that conversion was implemented as prep-work for the scheduler. So, yes the description should be updated to match the code and say 'assigns a request structure' not a seqno.

GEM needs to only care about its unique seqno along the timeline. We can
leave space for the scheduler to inject a second seqno if that is
convenient, but since the scheduler can keep a global list of inflight
requests, its retirement is also globally ordered even if we only
inspect each request wrt to its own timeline.
Currently, there is no 'own timeline'. The seqno timeline is global to the entire driver and not per engine, per context, or whatever. The conversion to more useful timelines still needs to be implemented. Until then, the scheduler must cope with the single global one.



The scheduler also
scans the list of objects associated with the batch buffer and
compares them against the objects already in use by other buffers in
the node list.
This part moves the policy of the GEM API and enshrines it into the
scheduler. Let GEM convert the implicit rules it has into explicit
dependencies, on to which we can add the explicit dependencies passed in
by the user. GEM has the information it needs to maintain its API, let's
not let that silliness spread.
So how should this be done? The scheduler needs some mechanism for ensuring that requests are only submitted to the hardware in a suitable order. And that information needs to persist for as long as the request still exists in an uncompleted state.


If matches are found then the new batch buffer node is
marked as being dependent upon the matching node. The same is done for
the context object. The scheduler also bumps up the priority of such
matching nodes on the grounds that the more dependencies a given batch
buffer has the more important it is likely to be.
Sounds like a rough cut at PI avoidance. This is easily abusable. It
sounds like you want a prio/nice split. Is that a good idea to introduce
in the first series?
Not sure how you mean that it can be abused?

The scheduler aims to have a given (tuneable) number of batch buffers
in flight on the hardware at any given time. If fewer than this are
currently executing when a new node is queued, then the node is passed
straight through to the submit function. Otherwise it is simply added
to the queue and the driver returns back to user land.
Keep it in the scheduler, I see you decided to add EAGAIN to GEM execbuf
when currently execbuf *blocks* in this situation. Don't use EAGAIN here
since you just make userspace busyspin over a very, very heavyweight
ioctl, and you have the power here to throttle userspace without
blocking.
Not sure where you are looking. The above description does not talk about returning EAGAIN at any point. It says the batch buffer is either sent directly to the hardware (if the case of idle hardware) or added to the scheduler's queue (in the case of busy hardware). Either way, the execbuf IOCTL returns success to the user. There is a later patch in the series which adds throttling to prevent a user from submitting arbitrarily large numbers of slow batches to the scheduler and effectively consuming unbounded amounts of memory due to the scheduler's queue not being limited by the size of a ring buffer. However, the throttling is done by waiting on an outstanding request so as to block until the queue depth has been reduced. It only returns an error to the user in the case where the wait could not be done for some reason (e.g. wedged GPU).


As each batch buffer completes, it raises an interrupt which wakes up
the scheduler. Note that it is possible for multiple buffers to
complete before the IRQ handler gets to run. Further, the seqno values
of the individual buffers are not necessary incrementing as the
scheduler may have re-ordered their submission. However, the scheduler
keeps the list of executing buffers in order of hardware submission.
Thus it can scan through the list until a matching seqno is found and
then mark all in flight nodes from that point on as completed.
No. You haven't built your timelines correctly. The idea behind the
timeline is that a request can wait upon its seqno and check it against
its own timeline, which is ordered only with other requests on its
timeline. Because of this, it is independent of whatever order the
scheduler executes the timelines in, each timeline is ordered.

A request can simply wait for its timeline to advance, completely
ignorant of the scheduler. (Request signaling may be driven by the
scheduler, but that is a lowlevel, not GEM or dma-buf/fence,
implementation detail. And only if the scheduler is coupled into the
user-interupt, but on execlists it will be using the context-switch
interrupt to driver itself, and for ringbuffer mode we have a choice of
user-interrupt or using pipe-control/dw-notify to keep the paths
separate.)
Again, you are assuming per context timelines which the driver does not yet have. That is planned work but is not yet implemented.


A deferred work queue is also poked by the interrupt handler.
Wait you did all that in the interrupt handler. NO.
Actually, the comment is again out of date. Since the seqno -> request conversion series, the scheduler no longer has to worry about out of order seqno values from the hardware. So all of the above IRQ mess no longer occurs.


If a suitable node is found then it is sent to execbuff_final() for
submission to the hardware.
That is an absolutely atrocious implementation. Don't pass it back up
the layers when you already have all the information you need packaged in
the request to submit it to hardware.

execbuf -> engine->add_request -> [scheduler] -> engine->submit_request

(conceptually, since what should happen is execlists gets it
context-switch interrupt and then asks what set of requests to submit
next. In a ringbuffer mode, a kthread/ktasklet would run off the
interrupt ask which request to execute next and do the write into the
ring for the switch into the new context and execution of the next batch,
and then write the RING_TAIL.)

At the fundamental level it looks like you have not introduced timelines
correctly or introduced the scheduler as a separate entity for deciding
which request to execute next (or if this request should preempt execution).
When this code was first written, there was not even the possibility of implementing per context timelines. Since that first version, there has been a lot of complex prep work (anti-OLR series, seqno to request conversion, struct fence conversion) which all makes the scheduler series a lot simpler. During that process, there has been amply opportunity to comment on the direction it was taking and to explain what could be improved and how.

The agreement on per context timelines is that while it would make things simpler still, it is yet more delay and could be left until after the current scheduler series has landed. Otherwise it is a never ending task to simply keep rebasing the scheduler patches onto a rapidly changing target let alone write entire extra patch series and rework the scheduler accordingly. Plus, we have multiple customers that require the scheduler functionality (and the pre-emption support that builds on top) so we need to get something shipping now rather than in another X years time.

-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux