From: John Harrison <John.C.Harrison@xxxxxxxxx> Implemented a batch buffer submission scheduler for the i915 DRM driver. The general theory of operation is that when batch buffers are submitted to the driver, the execbuffer() code assigns a unique seqno value and then packages up all the information required to execute the batch buffer at a later time. This package is given over to the scheduler which adds it to an internal node list. The scheduler also scans the list of objects associated with the batch buffer and compares them against the objects already in use by other buffers in the node list. If matches are found then the new batch buffer node is marked as being dependent upon the matching node. The same is done for the context object. The scheduler also bumps up the priority of such matching nodes on the grounds that the more dependencies a given batch buffer has the more important it is likely to be. The scheduler aims to have a given (tuneable) number of batch buffers in flight on the hardware at any given time. If fewer than this are currently executing when a new node is queued, then the node is passed straight through to the submit function. Otherwise it is simply added to the queue and the driver returns back to user land. As each batch buffer completes, it raises an interrupt which wakes up the scheduler. Note that it is possible for multiple buffers to complete before the IRQ handler gets to run. Further, the seqno values of the individual buffers are not necessary incrementing as the scheduler may have re-ordered their submission. However, the scheduler keeps the list of executing buffers in order of hardware submission. Thus it can scan through the list until a matching seqno is found and then mark all in flight nodes from that point on as completed. A deferred work queue is also poked by the interrupt handler. When this wakes up it can do more involved processing such as actually removing completed nodes from the queue and freeing up the resources associated with them (internal memory allocations, DRM object references, context reference, etc.). The work handler also checks the in flight count and calls the submission code if a new slot has appeared. When the scheduler's submit code is called, it scans the queued node list for the highest priority node that has no unmet dependencies. Note that the dependency calculation is complex as it must take inter-ring dependencies and potential preemptions into account. Note also that in the future this will be extended to include external dependencies such as the Android Native Sync file descriptors and/or the linux dma-buff synchronisation scheme. If a suitable node is found then it is sent to execbuff_final() for submission to the hardware. The in flight count is then re-checked and a new node popped from the list if appropriate. The scheduler also allows high priority batch buffers (e.g. from a desktop compositor) to jump ahead of whatever is already running if the underlying hardware supports pre-emption. In this situation, any work that was pre-empted is returned to the queued list ready to be resubmitted when no more high priority work is outstanding. Various IGT tests are in progress to test the scheduler's operation and will follow. v2: Updated for changes in struct fence patch series and other changes to underlying tree (e.g. removal of cliprects). Also changed priority levels to be signed +/-1023 range and reduced mutex lock usage. v3: More reuse of cached pointers rather than repeated dereferencing (David Gordon). Moved the dependency generation code out to a seperate function for easier readability. Also added in support for the read-read optimisation. Major simplification of the DRM file close handler. Fixed up an overzealous WARN. Removed unnecessary flushing of the scheduler queue when waiting for a request. v4: Removed user land fence/sync integration as this is dependent upon the de-staging of the Android sync code. That de-staging is now being done by someone else. The sync support will be added back in to the scheduler in a separate patch series which must wait for the de-staging to be landed first. Added support for killing batches from contexts that were banned after the batches were submitted to the scheduler. Changed various comments to fix typos, update to reflect changes to the code, correct formatting and line wrapping, etc. Also wrapped various long lines and twiddled white space to keep the style checker happy. Changed a bunch of BUG_ONs to WARN_ONs as apparently the latter are preferred. Used the correct array memory allocator function (kmalloc_array instead of kmalloc). Fixed a variable type (review comment by Joonas). Fixed a WARN_ON firing incorrectly when removing killed nodes from the scheduler's queue. Dropped register definition update patch from this series. The changes are all for pre-emption so it makes more sense for it to be part of that series instead. v5: Reverted power management changes as they apparently conflict with mutex acquisition. Converted the override mask module parameter to a set of boolean enable flags (just one in this patch set, but others are added later for controlling pre-emption). [Chris Wilson] Removed lots of whitespace from i915_scheduler.c and re-ordered it to remove all forward declarations. Squashed down the i915_scheduler.c sections of various patches into the initial 'start of scheduler' patch. Thus the later patches simply hook in existing code into various parts of the driver rather than adding the code as well. Added documentation to various functions. Re-worked the submit function in terms of mutex locking, error handling and exit paths. Split the delayed work handler function in half. Made use of the kernel 'clamp' macro. [Joonas Lahtinen] Dropped the 'immediate submission' override option. This was a half way house between full scheduler and direct submission and was only really useful during early debug. Added a re-install of the scheduler's interrupt hook around GPU reset. [Zhipeng Gong] Used lighter weight spinlocks. v6: Updated to newer nightly (lots of ring -> engine renaming). Added 'for_each_scheduler_node()' and 'assert_scheduler_lock_held()' helper macros. Renamed 'i915_gem_execbuff_release_batch_obj' to 'i915_gem_execbuf_release_batch_obj'. Updated to use 'to_i915()' instead of dev_private. Converted all enum labels to uppercase. Removed various unnecessary WARNs. Renamed 'saved_objects' to just 'objs'. More code refactoring. Removed even more white space. Added an i915_scheduler_destroy() function instead of doing explicit clean up of scheduler internals from i915_driver_unload(). Changed extra boolean i915_wait_request() parameter to a flags word and consumed the original boolean parameter too. Also, replaced the i915_scheduler_is_request_tracked() function with i915_scheduler_is_mutex_required() and i915_scheduler_is_request_batch_buffer() as the need for the former has gone away and it was really being used to ask the latter two questions in a convoluted manner. Wrapped boolean 'flush' parameter to intel_engine_idle() with an _flush() macro. [review feedback from Joonas Lahtinen] Moved scheduler modue parameter declaration to correct place in i915_params struct. [review feedback from Matt Roper] Added an admin only check when setting the tuning parameters via debugfs to prevent rogue user code trying to break the system with strange settings. [review feedback from Jesse Barnes] Added kerneldoc for intel_engine_idle(). Added running totals of 'flying' and 'queued' nodes rather than re-calculating each time as a minor CPU performance optimisation. Removed support for out of order seqno completion. All the prep work patch series (seqno to request conversion, late seqno assignment, etc.) that has now been done means that the scheduler no longer generates out of order seqno completions. Thus all the complex code for coping with such is no longer required and can be removed. Fixed a bug in scheduler bypass mode introduced in the clean up code refactoring of v5. The clean up function was seeing the node in the wrong state and thus refusing to process it. Improved the throttle by file handle feature by chaning from a simple 'return to userland when full' scheme with a 'sleep on request' scheme. The former could lead to the busy polling and wasting lots of CPU time as user land continuously retried the execbuf IOCTL in a tight loop. Now the driver will sleep (without holding the mutex lock) on the oldest request outstanding for that file and then automatically retry. This is closer to the pre-scheduler behaviour of stalling on a full ring buffer. [Patches against drm-intel-nightly tree fetched 13/04/2016 with struct fence conversion patches applied] Dave Gordon (2): drm/i915: Cache request pointer in *_submission_final() drm/i915: Add scheduling priority to per-context parameters John Harrison (32): drm/i915: Add total count to context status debugfs output drm/i915: Prelude to splitting i915_gem_do_execbuffer in two drm/i915: Split i915_dem_do_execbuffer() in half drm/i915: Re-instate request->uniq because it is extremely useful drm/i915: Start of GPU scheduler drm/i915: Disable hardware semaphores when GPU scheduler is enabled drm/i915: Force MMIO flips when scheduler enabled drm/i915: Added scheduler hook when closing DRM file handles drm/i915: Added scheduler hook into i915_gem_request_notify() drm/i915: Added deferred work handler for scheduler drm/i915: Redirect execbuffer_final() via scheduler drm/i915: Keep the reserved space mechanism happy drm/i915: Added tracking/locking of batch buffer objects drm/i915: Hook scheduler node clean up into retire requests drm/i915: Added scheduler support to __wait_request() calls drm/i915: Added scheduler support to page fault handler drm/i915: Added scheduler flush calls to ring throttle and idle functions drm/i915: Add scheduler hook to GPU reset drm/i915: Added a module parameter to allow the scheduler to be disabled drm/i915: Support for 'unflushed' ring idle drm/i915: Defer seqno allocation until actual hardware submission time drm/i915: Added trace points to scheduler drm/i915: Added scheduler queue throttling by DRM file handle drm/i915: Added debugfs interface to scheduler tuning parameters drm/i915: Add early exit to execbuff_final() if insufficient ring space drm/i915: Added scheduler statistic reporting to debugfs drm/i915: Add scheduler support functions for TDR drm/i915: Enable GPU scheduler by default drm/i915: Add support for retro-actively banning batch buffers drm/i915: Allow scheduler to manage inter-ring object synchronisation drm/i915: Added debug state dump facilities to scheduler drm/i915: Scheduler state dump via debugfs drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/i915_debugfs.c | 336 +++++- drivers/gpu/drm/i915/i915_dma.c | 5 + drivers/gpu/drm/i915/i915_drv.c | 9 + drivers/gpu/drm/i915/i915_drv.h | 58 +- drivers/gpu/drm/i915/i915_gem.c | 156 ++- drivers/gpu/drm/i915/i915_gem_context.c | 24 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 297 +++-- drivers/gpu/drm/i915/i915_params.c | 4 + drivers/gpu/drm/i915/i915_params.h | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 1709 ++++++++++++++++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.h | 180 +++ drivers/gpu/drm/i915/i915_trace.h | 225 +++- drivers/gpu/drm/i915/intel_display.c | 10 +- drivers/gpu/drm/i915/intel_lrc.c | 161 ++- drivers/gpu/drm/i915/intel_lrc.h | 1 + drivers/gpu/drm/i915/intel_ringbuffer.c | 69 +- drivers/gpu/drm/i915/intel_ringbuffer.h | 5 +- include/uapi/drm/i915_drm.h | 1 + 19 files changed, 3118 insertions(+), 134 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_scheduler.c create mode 100644 drivers/gpu/drm/i915/i915_scheduler.h -- 1.9.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx