[PATCH 1/1 RFC] drivers/gpu/drm/i915:Documentation for batchbuffer submission

kevin.rogovin@xxxxxxxxx · Fri, 16 Feb 2018 16:04:22 +0200

From: Kevin Rogovin <kevin.rogovin@xxxxxxxxx>

Signed-off-by: Kevin Rogovin <kevin.rogovin@xxxxxxxxx>
---
 Documentation/gpu/i915.rst                 | 109 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +++
 2 files changed, 119 insertions(+)

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..36b3ade85839 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -13,6 +13,18 @@ Core Driver Infrastructure
 This section covers core driver infrastructure used by both the display
 and the GEM parts of the driver.
 
+Initialization
+--------------
+
+The real action of initialization for the i915 driver is handled by
+:c:func:`i915_driver_load`; from this function one can see the key
+data (in paritcular :c:struct:'drm_driver' for GEM) of the entry points
+to to the driver from user space. 
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.c
+   :functions: i915_driver_load
+
+
 Runtime Power Management
 ------------------------
 
@@ -249,6 +261,102 @@ Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+----------------
+
+An Intel GPU has multiple engines. There are several engine types.
+
+- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_DEFAULT` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified Memory
+Access. For having the GPU "do work", user space will feed the GPU batch buffers
+via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct the
+GPU to perform work (for example rendering) and that work needs memory from
+which to read and memory to which to write. All memory is encapsulated within
+GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE).
+An ioctl providing a batchbuffer for the GPU to create will also list all GEM
+buffer objects that the batchbuffer reads and/or writes.
+
+The GPU has its own memory management and address space. The kernel driver
+maintains the memory translation table for the GPU. For older GPUs (i.e. those
+before Gen8), there is a single global such translation table, a global
+Graphics Translation Table (GTT). For newer generation GPUs each hardware
+context has its own translation table, called Per-Process Graphics Translation
+Table (PPGTT). Of important note, is that although PPGTT is named per-process it
+is actually per hardware context. When user space submits a batchbuffer, the kernel
+walks the list of GEM buffer objects used by the batchbuffer and guarantees
+that not only is the memory of each such GEM buffer object resident but it is
+also present in the (PP)GTT. If the GEM buffer object is not yet placed in
+the (PP)GTT, then it is given an address. Two consequences of this are:
+the kernel needs to edit the batchbuffer submitted to write the correct
+value of the GPU address when a GEM BO is assigned a GPU address and
+the kernel might evict a different GEM BO from the (PP)GTT to make address
+room for a GEM BO.
+
+Consequently, the ioctls submitting a batchbuffer for execution also include
+a list of all locations within buffers that refer to GPU-addresses so that the
+kernel can edit the buffer correctly. This process is dubbed relocation. The
+ioctls allow user space to provide what the GPU address could be. If the kernel
+sees that the address provided by user space is correct, then it skips performing
+relocation for that GEM buffer object. In addition, the ioctl's provide to what
+addresses the kernel relocates each GEM buffer object.
+
+There is also an interface for user space to directly specify the address location
+of GEM BO's, the feature soft-pinning and made active within an execbuffer2
+ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies I915_EXEC_NO_RELOC,
+then the kernel is to not execute any relocation and user-space manages the address
+space for its PPGTT itself. The advantage of user space handling address space is
+that then the kernel does far less work and user space can safely assume that
+GEM buffer object's location in GPU address space do not change.
+
+Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware context
+represents GPU state that can be saved and restored. When user space uses a hardware
+context, it does not need to restore the GPU state at the start of each batchbuffer
+because the kernel directly the GPU to load the state from the hardware context.
+Hardware contexts allow for much greater isolation between processes that use the GPU.
+
+Batchbuffer Submission
+----------------------
+
+Depending on GPU generation, the i915 kernel driver will submit batchbuffers
+in one of the several ways. However, the top code logic is shared for all
+methods. They key function, i915_gem_do_execbuffer() essentially converts
+the ioctl command to an internal data structure which is then added to a queue
+which is processed elsewhere to give the job to the GPU; the details of
+i915_gem_do_execbuffer() are covered in `Common Code`_.
+
+
+Common Code
+~~~~~~~~~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :functions: i915_gem_do_execbuffer
+
+Batchbuffer Submission Varieties 
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As stated before, there are several varieties in how to submit batchbuffers to the GPU;
+which one in use is controlled by function pointer values in the c-struct intel_engine_cs
+(defined in drivers/gpu/drm/i915/intel_ringbuffer.h)
+
+- request_alloc
+- submit_request
+
+The three varieties for submitting batchbuffer to the GPU are the following.
+
+1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8. When batchbuffers are submitted this older way, their contents are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_.
+2. Batchbuffer are submitting via execlists are a features supported by Gen8 and new devices; the macro :c:macro:'HAS_EXECLISTS' is used to determine if a GPU supports submitting via execlists, see `Logical Rings, Logical Ring Contexts and Execlists`_.    
+3. Batchbuffer are submitted to the GuC, see `GuC`_.
+
+
+
 Batchbuffer Parsing
 -------------------
 
@@ -266,6 +374,7 @@ Batchbuffer Pools
 
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
    :internal:
 
 Logical Rings, Logical Ring Contexts and Execlists
 --------------------------------------------------
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b15305f2fb76..4a22ae86ceb3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2178,6 +2178,16 @@ signal_fence_array(struct i915_execbuffer *eb,
 	}
 }
 
+/**
+ * i915_gem_do_execbuffer() - Batchbuffer submission common implementation
+ *
+ * All ioctl's for submitting a batchbuffer reduce to this function;
+ * This function will place the batchbuffer to be executed on a submission
+ * queue which will later (via interupt calling into i915 driver) place
+ * send the batchbuffer to the GPU.
+ *
+ * Return: 0 on success, error code on failure
+ */
 static int
 i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,
-- 
2.16.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx