On Thu, Nov 16, 2023 at 12:52 PM Christian König <ckoenig.leichtzumerken@xxxxxxxxx> wrote: > > Start to improve the scheduler document. Especially document the > lifetime of each of the objects as well as the restrictions around > DMA-fence handling and userspace compatibility. A few minor grammatical suggestions below. > > v2: Some improvements suggested by Danilo, add section about error > handling. > > Signed-off-by: Christian König <christian.koenig@xxxxxxx> > --- > Documentation/gpu/drm-mm.rst | 36 +++++ > drivers/gpu/drm/scheduler/sched_main.c | 174 +++++++++++++++++++++---- > 2 files changed, 188 insertions(+), 22 deletions(-) > > diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst > index acc5901ac840..112463fa9f3a 100644 > --- a/Documentation/gpu/drm-mm.rst > +++ b/Documentation/gpu/drm-mm.rst > @@ -552,12 +552,48 @@ Overview > .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > :doc: Overview > > +Job Object > +---------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > + :doc: Job Object > + > +Entity Object > +------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > + :doc: Entity Object > + > +Hardware Fence Object > +--------------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > + :doc: Hardware Fence Object > + > +Scheduler Fence Object > +---------------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > + :doc: Scheduler Fence Object > + > +Scheduler and Run Queue Objects > +------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > + :doc: Scheduler and Run Queue Objects > + > Flow Control > ------------ > > .. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > :doc: Flow Control > > +Error and Timeout handling > +-------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/scheduler/sched_main.c > + :doc: Error and Timeout handling > + > Scheduler Function References > ----------------------------- > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 044a8c4875ba..026123497b0e 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -24,28 +24,122 @@ > /** > * DOC: Overview > * > - * The GPU scheduler provides entities which allow userspace to push jobs > - * into software queues which are then scheduled on a hardware run queue. > - * The software queues have a priority among them. The scheduler selects the entities > - * from the run queue using a FIFO. The scheduler provides dependency handling > - * features among jobs. The driver is supposed to provide callback functions for > - * backend operations to the scheduler like submitting a job to hardware run queue, > - * returning the dependencies of a job etc. > - * > - * The organisation of the scheduler is the following: > - * > - * 1. Each hw run queue has one scheduler > - * 2. Each scheduler has multiple run queues with different priorities > - * (e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL) > - * 3. Each scheduler run queue has a queue of entities to schedule > - * 4. Entities themselves maintain a queue of jobs that will be scheduled on > - * the hardware. > - * > - * The jobs in a entity are always scheduled in the order that they were pushed. > - * > - * Note that once a job was taken from the entities queue and pushed to the > - * hardware, i.e. the pending queue, the entity must not be referenced anymore > - * through the jobs entity pointer. > + * The GPU scheduler implements some logic to decide which command submission s/some// > + * to push next to the hardware. Another major use case of the GPU scheduler s/use case/feature/ > + * is to enforce correct driver behavior around those command submissions. > + * Because of this it's also used by drivers which don't need the actual s/it's/it can also/ > + * scheduling functionality. > + * > + * All callbacks the driver needs to implement are restricted by DMA-fence > + * signaling rules to guarantee deadlock free forward progress. This especially s/especially// > + * means that for normal operation no memory can be allocated in a callback. > + * All memory which is needed for pushing the job to the hardware must be > + * allocated before arming a job. It also means that no locks can be taken s/also// > + * under which memory might be allocated as well. > + * > + * Memory which is optional to allocate, for example for device core dumping or > + * debugging, *must* be allocated with GFP_NOWAIT and appropriate error > + * handling taking if that allocation fails. GFP_ATOMIC should only be used if s/taking/taken/ > + * absolutely necessary since dipping into the special atomic reserves is > + * usually not justified for a GPU driver. > + */ > + > +/** > + * DOC: Job Object > + * > + * The base job object contains submission dependencies in the form of DMA-fence > + * objects. Drivers can also implement an optional prepare_job callback which > + * returns additional dependencies as DMA-fence objects. It's important to note s/it's/it is/ > + * that this callback can't allocate memory or grab locks under which memory is > + * allocated. > + * > + * Drivers should use this as base class for an object which contains the s/this as/this object as/ > + * necessary state to push the command submission to the hardware. > + * > + * The lifetime of the job object should at least be from pushing it into the s/should at least be/should span/ > + * scheduler until the scheduler notes through the free callback that a job > + * isn't needed any more. Drivers can of course keep their job object alive > + * longer than that, but that's outside of the scope of the scheduler > + * component. Job initialization is split into two parts, drm_sched_job_init() > + * and drm_sched_job_arm(). It's important to note that after arming a job Describe why we have job_init() and job_arm(). > + * drivers must follow the DMA-fence rules and can't easily allocate memory > + * or takes locks under which memory is allocated. s/takes/take/ > + */ > + > +/** > + * DOC: Entity Object > + * > + * The entity object which is a container for jobs which should execute s/which// > + * sequentially. Drivers should create an entity for each individual context > + * they maintain for command submissions which can run in parallel. > + * > + * The lifetime of the entity should *not* exceed the lifetime of the > + * userspace process it was created for and drivers should call the > + * drm_sched_entity_flush() function from their file_operations.flush > + * callback. So it's possible that an entity object is not alive any > + * more while jobs from it are still running on the hardware. s/So it's possible that an entity object is not alive any more/It is possible that an entity may be freed/ > + * > + * Background is that for compatibility reasons with existing s/Background is that for/For/ > + * userspace all results of a command submission should become visible > + * externally even after after a process exits. This is normal POSIX behavior s/after after/after/ > + * for I/O operations. > + * > + * The problem with this approach is that GPU submissions contain executable > + * shaders enabling processes to evade their termination by offloading work to > + * the GPU. So when a process is terminated with a SIGKILL the entity object s/The problem with this approach is that GPU submissions contain executable shaders enabling processes to evade their termination by offloading work to the GPU/ GPU submissions contain work executing on the GPU which leaves an aspect of the process behind even after process termination/ > + * makes sure that jobs are freed without running them while still maintaining > + * correct sequential order for signaling fences. > + */ > + > +/** > + * DOC: Hardware Fence Object > + * > + * The hardware fence object is a DMA-fence provided by the driver as result of > + * running jobs. Drivers need to make sure that the normal DMA-fence semantics > + * are followed for this object. It's important to note that the memory for s/It's/It is/ > + * this object can *not* be allocated in the run_job callback since that would > + * violate the requirements for the DMA-fence implementation. The scheduler > + * maintains a timeout handler which triggers if this fence doesn't signal in > + * a configurable time frame. > + * > + * The lifetime of this object follows DMA-fence ref-counting rules, the > + * scheduler takes ownership of the reference returned by the driver and drops > + * it when it's not needed any more. > + */ > + > +/** > + * DOC: Scheduler Fence Object > + * > + * The scheduler fence object which encapsulates the whole time from pushing s/which// > + * the job into the scheduler until the hardware has finished processing it. > + * This is internally managed by the scheduler, but drivers can grab additional > + * reference to it after arming a job. The implementation provides DMA-fence > + * interfaces for signaling both scheduling of a command submission as well as > + * finishing of processing. > + * > + * The lifetime of this object also follows normal DMA-fence ref-counting > + * rules. The finished fence is the one normally exposed outside of the > + * scheduler, but the driver can grab references to both the scheduled as well > + * as the finished fence when needed for pipe-lining optimizations. s/pipe-lining/pipelining/ > + */ > + > +/** > + * DOC: Scheduler and Run Queue Objects > + * > + * The scheduler object itself does the actual work of selecting a job and s/itself// > + * pushing it to the hardware. Both FIFO and RR selection algorithm are s/FIFO and RR selection algorithm/FIFO and Round Robin (RR) selection algorithms/ > + * supported, but FIFO is preferred for many use cases. s/is preferred for many use cases/is the default and preferred for most use cases/ > + * > + * The lifetime of the scheduler is managed by the driver using it. Before > + * destroying the scheduler the driver must ensure that all hardware processing s/scheduler the/scheduler, the/ > + * involving this scheduler object has finished by calling for example > + * disable_irq(). It is *not* sufficient to wait for the hardware fence here > + * since this doesn't guarantee that all callback processing has finished. I'm not quite sure what you are trying to say here. I think the idea is that the driver must wait for all signalling mechanisms (IRQ handlers, callbacks, etc.) that might need access to the scheduler object. > + * > + * The run queue object is a container of entities for a certain priority > + * level. This object is internally managed by the scheduler and drivers s/This object is/These objects are > + * shouldn't touch them directly. The lifetime of run queues are bound to the > + * schedulers lifetime. > */ > > /** > @@ -72,6 +166,42 @@ > * limit. > */ > > +/** > + * DOC: Error and Timeout handling > + * > + * Errors schould be signaled by using dma_fence_set_error() on the hardware s/schould/should/ > + * fence object before signaling it. Errors are then bubbled up from the > + * hardware fence to the scheduler fence. > + * > + * The entity allows querying errors on the last run submission using the > + * drm_sched_entity_error() function which can be used to cancel queued > + * submissions in the run_job callback as well as preventing pushing further s/further/additional/ > + * ones into the entity in the drivers submission function. s/ones/jobs/ s/drivers/driver's/ > + * > + * When the hardware fence fails to signal in a configurable amount of time the > + * timedout_job callback is issued. The driver should then follow the procedure > + * described on the &struct drm_sched_backend_ops.timedout_job callback (TODO: > + * The timeout handler should probably switch to using the hardware fence as > + * parameter instead of the job. Otherwise the handling will always race > + * between timing out and signaling the fence). > + * > + * The scheduler also used to provided functionality for re-submitting jobs s/The scheduler also used to provided/NOTE: The scheduler used to provide/ > + * with replacing the hardware fence during reset handling. This functionality s/with replacing/while replacing/ > + * is now marked as deprecated. This has proven to be fundamentally racy and > + * not compatible with DMA-fence rules and shouldn't be used in any new code. > + * > + * Additional there is the drm_sched_increase_karma() function which tries to s/Additional/Additionally, / > + * find the entity which submitted a job and increases it's 'karma' s/it's/its/ > + * atomic variable to prevent re-submitting jobs from this entity. This has > + * quite some overhead and re-submitting jobs is now marked as deprecated. So > + * using this function is rather discouraged. s/rather// > + * > + * Drivers can still re-create the GPU state should it be lost during timeout > + * handling when they can guarantee that forward progress is made and this > + * doesn't cause another timeout. But this is strongly hardware specific and s/doesn't/will not/ > + * out of the scope of the general GPU scheduler. > + */ > + > #include <linux/wait.h> > #include <linux/sched.h> > #include <linux/completion.h> > -- > 2.34.1 >