Re: [PATCH 1/1 RFC] drivers/gpu/drm/i915:Documentation for batchbuffer submission

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



These documentation improvements are much welcome, here are a few
comments from me.

Quoting kevin.rogovin@xxxxxxxxx (2018-02-16 16:04:22)
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_DEFAULT` in user space.

I'd call out I915_EXEC_RENDER existence here and introduce I915_EXEC_DEFAULT as
its own line.

> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
> +
> +The Intel GPU family is a familiy of integrated GPU's using Unified Memory
> +Access. For having the GPU "do work", user space will feed the GPU batch buffers
> +via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, `DRM_IOCTL_I915_GEM_EXECBUFFER2`
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct the

I'd also call out DRM_IOCTL_I915_GEM_EXECBUFFER to be legacy submission
method and primarily mention I915_GEM_EXECBUFFER2_WR.

> +GPU to perform work (for example rendering) and that work needs memory from
> +which to read and memory to which to write. All memory is encapsulated within
> +GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE).
> +An ioctl providing a batchbuffer for the GPU to create will also list all GEM
> +buffer objects that the batchbuffer reads and/or writes.
> +


In chronological order, maybe first introduce the hardware contexts?
Only then go to PPGTT.

> +The GPU has its own memory management and address space. The kernel driver
> +maintains the memory translation table for the GPU. For older GPUs (i.e. those
> +before Gen8), there is a single global such translation table, a global
> +Graphics Translation Table (GTT). For newer generation GPUs each hardware
> +context has its own translation table, called Per-Process Graphics Translation
> +Table (PPGTT). Of important note, is that although PPGTT is named per-process it
> +is actually per hardware context. When user space submits a batchbuffer, the kernel
> +walks the list of GEM buffer objects used by the batchbuffer and guarantees
> +that not only is the memory of each such GEM buffer object resident but it is
> +also present in the (PP)GTT. If the GEM buffer object is not yet placed in
> +the (PP)GTT, then it is given an address. Two consequences of this are:
> +the kernel needs to edit the batchbuffer submitted to write the correct
> +value of the GPU address when a GEM BO is assigned a GPU address and
> +the kernel might evict a different GEM BO from the (PP)GTT to make address
> +room for a GEM BO.
> +
> +Consequently, the ioctls submitting a batchbuffer for execution also include
> +a list of all locations within buffers that refer to GPU-addresses so that the
> +kernel can edit the buffer correctly. This process is dubbed relocation. The
> +ioctls allow user space to provide what the GPU address could be. If the kernel
> +sees that the address provided by user space is correct, then it skips performing
> +relocation for that GEM buffer object. In addition, the ioctl's provide to what
> +addresses the kernel relocates each GEM buffer object.
> +
> +There is also an interface for user space to directly specify the address location
> +of GEM BO's, the feature soft-pinning and made active within an execbuffer2
> +ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies I915_EXEC_NO_RELOC,
> +then the kernel is to not execute any relocation and user-space manages the address
> +space for its PPGTT itself. The advantage of user space handling address space is
> +that then the kernel does far less work and user space can safely assume that
> +GEM buffer object's location in GPU address space do not change.
> +
> +Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware context
> +represents GPU state that can be saved and restored. When user space uses a hardware
> +context, it does not need to restore the GPU state at the start of each batchbuffer
> +because the kernel directly the GPU to load the state from the hardware context.
> +Hardware contexts allow for much greater isolation between processes that use the GPU.
> +
> +Batchbuffer Submission
> +----------------------
> +
> +Depending on GPU generation, the i915 kernel driver will submit batchbuffers
> +in one of the several ways. However, the top code logic is shared for all
> +methods. They key function, i915_gem_do_execbuffer() essentially converts
> +the ioctl command to an internal data structure which is then added to a queue
> +which is processed elsewhere to give the job to the GPU; the details of
> +i915_gem_do_execbuffer() are covered in `Common Code`_.
> +
> +
> +Common Code
> +~~~~~~~~~~~
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +   :doc: User command execution
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +   :functions: i915_gem_do_execbuffer

I'm not sure about referring to internal functions as they're bound to
change often. No strong feeling on this, I just see this will be easy to
miss when changing the related code.

> +
> +Batchbuffer Submission Varieties 
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +As stated before, there are several varieties in how to submit batchbuffers to the GPU;
> +which one in use is controlled by function pointer values in the c-struct intel_engine_cs
> +(defined in drivers/gpu/drm/i915/intel_ringbuffer.h)
> +
> +- request_alloc
> +- submit_request

Same here. Due to the being here in a separate file, I'm not sure if this level
of detail is going to be kept up when changing the actual code?

> +
> +The three varieties for submitting batchbuffer to the GPU are the following.
> +
> +1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8. When batchbuffers are submitted this older way, their contents are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_.

Just for editing and reading pleasure, there must be a way of cutting
long lines in lists.

But more importantly, do refer to Command Parser/Parsing as the code uses
cmd parser aka. command parser extensively.

Regards, Joonas
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux