On Wed, 2024-09-11 at 13:22 +0100, Tvrtko Ursulin wrote: > > On 10/09/2024 11:25, Philipp Stanner wrote: > > On Mon, 2024-09-09 at 18:19 +0100, Tvrtko Ursulin wrote: > > > From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> > > > > > > Having removed one re-lock cycle on the entity->lock in a patch > > > titled > > > "drm/sched: Optimise drm_sched_entity_push_job", with only a tiny > > > bit > > > larger refactoring we can do the same optimisation on the rq- > > > >lock > > > (Currently both drm_sched_rq_add_entity() and > > > drm_sched_rq_update_fifo_locked() take and release the same > > > lock.) > > > > > > To achieve this we rename drm_sched_rq_add_entity() to > > > drm_sched_rq_add_entity_locked(), making it expect the rq->lock > > > to be > > > held, and also add the same expectation to > > > drm_sched_rq_update_fifo_locked(). > > > > > > For more stream-lining we also add the run-queue as an explicit > > > parameter > > > to drm_sched_rq_remove_fifo_locked() to avoid both callers and > > > callee > > > having to dereference entity->rq. > > > > Why is dereferencing it a problem? > > As you have noticed below the API is a bit unsightly. Consider for > example this call chain: > > drm_sched_entity_kill(entity) > drm_sched_rq_remove_entity(entity->rq, entity); > drm_sched_rq_remove_fifo_locked(entity); > struct drm_sched_rq *rq = entity->rq; > > A bit confused, no? > > I thought adding rq to remove_fifo_locked at least removes one back > and > forth between the entity->rq and rq. > > And then if we cache the rq in a local variable, after having > explicitly > taken the correct lock, we have this other call chain example: > > drm_sched_entity_push_job() > ... > rq = entity->rq; > spin_lock(rq->lock); > > drm_sched_rq_add_entity_locked(rq, entity); > drm_sched_rq_update_fifo_locked(rq, entity, submit_ts); > > spin_unlock(rq->lock); > > To me at least this reads more streamlined. Alright, doesn't sound to bad, but > > > > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx> > > > Cc: Christian König <christian.koenig@xxxxxxx> > > > Cc: Alex Deucher <alexander.deucher@xxxxxxx> > > > Cc: Luben Tuikov <ltuikov89@xxxxxxxxx> > > > Cc: Matthew Brost <matthew.brost@xxxxxxxxx> > > > Cc: Philipp Stanner <pstanner@xxxxxxxxxx> > > > --- > > > drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-- > > > drivers/gpu/drm/scheduler/sched_main.c | 41 +++++++++++++---- > > > ----- > > > -- > > > include/drm/gpu_scheduler.h | 7 ++-- > > > 3 files changed, 31 insertions(+), 24 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c > > > b/drivers/gpu/drm/scheduler/sched_entity.c > > > index b4c4f9923e0b..2102c726d275 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_entity.c > > > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > > > @@ -614,11 +614,14 @@ void drm_sched_entity_push_job(struct > > > drm_sched_job *sched_job) > > > sched = rq->sched; > > > > > > atomic_inc(sched->score); > > > - drm_sched_rq_add_entity(rq, entity); > > > + > > > + spin_lock(&rq->lock); > > > + drm_sched_rq_add_entity_locked(rq, entity); > > > > > > if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) > > > - drm_sched_rq_update_fifo_locked(entity, > > > submit_ts); > > > + drm_sched_rq_update_fifo_locked(entity, > > > rq, > > > submit_ts); > > > > > > + spin_unlock(&rq->lock); > > > spin_unlock(&entity->lock); > > > > > > drm_sched_wakeup(sched, entity); > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > > > b/drivers/gpu/drm/scheduler/sched_main.c > > > index 937e7d1cfc49..1ccd2aed2d32 100644 > > > --- a/drivers/gpu/drm/scheduler/sched_main.c > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c > > > @@ -153,41 +153,44 @@ static __always_inline bool > > > drm_sched_entity_compare_before(struct rb_node *a, > > > return ktime_before(ent_a->oldest_job_waiting, ent_b- > > > > oldest_job_waiting); > > > } > > > > > > -static inline void drm_sched_rq_remove_fifo_locked(struct > > > drm_sched_entity *entity) > > > +static void drm_sched_rq_remove_fifo_locked(struct > > > drm_sched_entity > > > *entity, > > > + struct drm_sched_rq > > > *rq) I would then at least like to see a comment somewhere telling the reader why rq is taken as a separate variable. One might otherwise easily wonder why it's not obtained through the entity and what the difference is. > > > > So here we'd add a new function parameter that still doesn't allow > > for > > getting rid of 'entity' as a parameter. > > We can't get rid of the entity. > > Maaaybe instead we could get rid of the rq in the whole chain, I mean > from drm_sched_rq_add_entity and drm_sched_rq_remove_entity to start > with. Let's postpone that. > > But then to remove double re-lock we still (like in this patch) need > to > make the callers take the locks and rename the helpers with _locked > suffix. Otherwise it would be incosistent that a lock is taken > outside > the helpers with no _locked suffix. > > I am not sure if that is better. All it achieves is remove the rq as > explicit parameter my making the callees dereference it from the > entity. OK, as I see it now it would actually be desirable to have suffix _locked indicate that the caller must hold all necessary locks. So your patch would actually make that consistent within drm/sched/. Looks good P. > > Worst part is all these helpers have drm_sched_rq_ prefix.. which to > me > reads as "we operate on rq". So not passing in rq is confusing to > start > with. > > Granted, some confusion still remains with my approach since ideally, > to > those helpers, I wanted to add some asserts that rq == entity->rq... > > > The API gets larger that way and readers will immediately wonder > > why > > sth is passed as a separate variable that could also be obtained > > through the pointer. > > > > > { > > > - struct drm_sched_rq *rq = entity->rq; > > > - > > > if (!RB_EMPTY_NODE(&entity->rb_tree_node)) { > > > rb_erase_cached(&entity->rb_tree_node, &rq- > > > > rb_tree_root); > > > RB_CLEAR_NODE(&entity->rb_tree_node); > > > } > > > } > > > > > > -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity > > > *entity, ktime_t ts) > > > +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity > > > *entity, > > > + struct drm_sched_rq *rq, > > > + ktime_t ts) > > > > The function is still called _locked. That implies to the reader > > that > > this function takes care of locking. But it doesn't anymore. > > Instead, > > > > > { > > > lockdep_assert_held(&entity->lock); > > > + lockdep_assert_held(&rq->lock); > > > > > > - spin_lock(&entity->rq->lock); > > > - > > > - drm_sched_rq_remove_fifo_locked(entity); > > > + drm_sched_rq_remove_fifo_locked(entity, rq); > > > > > > entity->oldest_job_waiting = ts; > > > > > > - rb_add_cached(&entity->rb_tree_node, &entity->rq- > > > > rb_tree_root, > > > + rb_add_cached(&entity->rb_tree_node, &rq->rb_tree_root, > > > drm_sched_entity_compare_before); > > > - > > > - spin_unlock(&entity->rq->lock); > > > } > > > > > > void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, > > > ktime_t ts) > > > { > > > + struct drm_sched_rq *rq; > > > + > > > /* > > > * Both locks need to be grabbed, one to protect from > > > entity->rq change > > > * for entity from within concurrent > > > drm_sched_entity_select_rq and the > > > * other to update the rb tree structure. > > > */ > > > spin_lock(&entity->lock); > > > - drm_sched_rq_update_fifo_locked(entity, ts); > > > + rq = entity->rq; > > > + spin_lock(&rq->lock); > > > + drm_sched_rq_update_fifo_locked(entity, rq, ts); > > > + spin_unlock(&rq->lock); > > > > its caller, drm_sched_rq_update_fifo(), now takes care of the > > locking. > > So if it all drm_sched_rq_update_fifo_locked() should be called > > drm_sched_rq_update_fifo_unlocked(). > > > > If such a change is really being done, we have to go through the > > entire > > scheduler and make sure that the suffix "_locked" is used > > consistently > > throughout the scheduler. And even better, as consistent with the > > kernel as possible. > > Use of _locked follows the existing pattern of > drm_sched_rq_remove_fifo_locked :shrug: > > Are you referring to drm_sched_start_timeout(_unlocked) which is the > opposite pattern? (Although a more recent addition.) > > The wider kernel also uses both patterns so don't know. Would be nice > to > align in the scheduler but drm_sched_start_timeout is out of scope > for > this series. > > > To be honest folks, I don't think this entire "optimization" patch > > is > > that much of a good idea. The scheduler has real, big problems, > > such as > > race conditions, memory leaks and lack of documentation. > > > > I think we should for the forseeable future dedicate our attention > > towards solving those problems, instead of optimizing things. > > Especially if the optimization might decrease readability as with > > the > > naming here. > > In principle I agree, but on the other hand lets first see if this > patch > is really making things any worse, or is perhaps just maintaining the > status quo in the API elegance department, while at the same time > removing the quite lazy double re-lock from the main submission path. > > Regards, > > Tvrtko > > > > > P. > > > > > > > spin_unlock(&entity->lock); > > > } > > > > > > @@ -210,25 +213,23 @@ static void drm_sched_rq_init(struct > > > drm_gpu_scheduler *sched, > > > } > > > > > > /** > > > - * drm_sched_rq_add_entity - add an entity > > > + * drm_sched_rq_add_entity_locked - add an entity > > > * > > > * @rq: scheduler run queue > > > * @entity: scheduler entity > > > * > > > * Adds a scheduler entity to the run queue. > > > */ > > > -void drm_sched_rq_add_entity(struct drm_sched_rq *rq, > > > - struct drm_sched_entity *entity) > > > +void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq, > > > + struct drm_sched_entity > > > *entity) > > > { > > > + lockdep_assert_held(&rq->lock); > > > + > > > if (!list_empty(&entity->list)) > > > return; > > > > > > - spin_lock(&rq->lock); > > > - > > > atomic_inc(rq->sched->score); > > > list_add_tail(&entity->list, &rq->entities); > > > - > > > - spin_unlock(&rq->lock); > > > } > > > > > > /** > > > @@ -242,6 +243,8 @@ void drm_sched_rq_add_entity(struct > > > drm_sched_rq > > > *rq, > > > void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > > > struct drm_sched_entity *entity) > > > { > > > + lockdep_assert_held(&entity->lock); > > > + > > > if (list_empty(&entity->list)) > > > return; > > > > > > @@ -254,7 +257,7 @@ void drm_sched_rq_remove_entity(struct > > > drm_sched_rq *rq, > > > rq->current_entity = NULL; > > > > > > if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) > > > - drm_sched_rq_remove_fifo_locked(entity); > > > + drm_sched_rq_remove_fifo_locked(entity, rq); > > > > > > spin_unlock(&rq->lock); > > > } > > > diff --git a/include/drm/gpu_scheduler.h > > > b/include/drm/gpu_scheduler.h > > > index 5a1e4c803b90..2ad33e2fe2d2 100644 > > > --- a/include/drm/gpu_scheduler.h > > > +++ b/include/drm/gpu_scheduler.h > > > @@ -591,13 +591,14 @@ bool drm_sched_dependency_optimized(struct > > > dma_fence* fence, > > > struct drm_sched_entity > > > *entity); > > > void drm_sched_fault(struct drm_gpu_scheduler *sched); > > > > > > -void drm_sched_rq_add_entity(struct drm_sched_rq *rq, > > > - struct drm_sched_entity *entity); > > > +void drm_sched_rq_add_entity_locked(struct drm_sched_rq *rq, > > > + struct drm_sched_entity > > > *entity); > > > void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > > > struct drm_sched_entity > > > *entity); > > > > > > void drm_sched_rq_update_fifo(struct drm_sched_entity *entity, > > > ktime_t ts); > > > -void drm_sched_rq_update_fifo_locked(struct drm_sched_entity > > > *entity, ktime_t ts); > > > +void drm_sched_rq_update_fifo_locked(struct drm_sched_entity > > > *entity, > > > + struct drm_sched_rq *rq, > > > ktime_t ts); > > > > > > int drm_sched_entity_init(struct drm_sched_entity *entity, > > > enum drm_sched_priority priority, > > >