Am 06.08.2018 02:13, schrieb Dieter Nützel:
Am 04.08.2018 06:18, schrieb Dieter Nützel:Am 04.08.2018 06:12, schrieb Dieter Nützel:Am 04.08.2018 05:27, schrieb Dieter Nützel:Am 03.08.2018 13:09, schrieb Christian König:Am 03.08.2018 um 03:08 schrieb Dieter Nützel:Hello Christian, AMD guys, this one _together_ with these series [PATCH 1/7] drm/amdgpu: use new scheduler load balancing for VMs https://lists.freedesktop.org/archives/amd-gfx/2018-August/024802.html on top of amd-staging-drm-next 53d5f1e4a6d9freeze whole system (Intel Xeon X3470, RX580) during _first_ mouse move.Same for sddm login or first move in KDE Plasma 5. NO logs so far. - Expected?Not even remotely, can you double check which patch from the "[PATCH 1/7] drm/amdgpu: use new scheduler load balancing for VMs" series iscausing the issue?Ups, _both_ 'series' on top of bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem definitions into amdgpu_gem header works without a hitch. But I have new (latest) µcode from openSUSE Tumbleweed. kernel-firmware-20180730-35.1.src.rpm Tested-by: Dieter Nützel <Dieter@xxxxxxxxxxxxx>I take this back. Last much longer. Mouse freeze. Could grep a dmesg with remote phone ;-) See the attachment. DieterArgh, shi... wrong dmesg version. Should be this one. (For sure...)Puh, this took some time...During the 'last' git bisect run => 'first bad commit is' I got next freeze.But I could get a new dmesg.log file per remote phone (see attachment). git bisect log show this: SOURCE/amd-staging-drm-next> git bisect log git bisect start # good: [adebfff9c806afe1143d69a0174d4580cd27b23d] drm/scheduler: fix setting the priorty for entities git bisect good adebfff9c806afe1143d69a0174d4580cd27b23d # bad: [43202e67a4e6fcb0e6b773e8eb1ed56e1721e882] drm/amdgpu: use entity instead of ring for CS git bisect bad 43202e67a4e6fcb0e6b773e8eb1ed56e1721e882 # bad: [9867b3a6ddfb73ee3105871541053f8e49949478] drm/amdgpu: use scheduler load balancing for compute CS git bisect bad 9867b3a6ddfb73ee3105871541053f8e49949478 # good: [5d097a4591aa2be16b21adbaa19a8abb76e47ea1] drm/amdgpu: use scheduler load balancing for SDMA CS git bisect good 5d097a4591aa2be16b21adbaa19a8abb76e47ea1 # first bad commit: [9867b3a6ddfb73ee3105871541053f8e49949478] drm/amdgpu: use scheduler load balancing for compute CS git log --oneline 5d097a4591aa (HEAD, refs/bisect/good-5d097a4591aa2be16b21adbaa19a8abb76e47ea1) drm/amdgpu: use scheduler load balancing for SDMA CS d12ae5172f1f drm/amdgpu: use new scheduler load balancing for VMs adebfff9c806 (refs/bisect/good-adebfff9c806afe1143d69a0174d4580cd27b23d) drm/scheduler: fix setting the priorty for entities bf1fd52b0632 (origin/amd-staging-drm-next) drm/amdgpu: move gem definitions into amdgpu_gem header 5031ae5f9e5c drm/amdgpu: move psp macro into amdgpu_psp header [-] I'm not really sure that drm/amdgpu: use scheduler load balancing for compute CS is the offender. One step earlier could it be, too. drm/amdgpu: use scheduler load balancing for SDMA CS I'm try running with the SDMA CS patch for the next days. If you need more ask!
Hello Christian, running the second day _without_ the 2. patch [2/7] drm/amdgpu: use scheduler load balancing for SDMA CS my system is stable, again. To be clear. I've now only #1 applied on top of amd-staging-drm-next. 'This one' is still in. So we should switching the thread. Dieter
Thanks, Christian.Greetings, Dieter Am 01.08.2018 16:27, schrieb Christian König:Since we now deal with multiple rq we need to update all of them, notjust the current one. Signed-off-by: Christian König <christian.koenig@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 +--drivers/gpu/drm/scheduler/gpu_scheduler.c | 36 ++++++++++++++++++++-----------include/drm/gpu_scheduler.h | 5 ++--- 3 files changed, 26 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index df6965761046..9fcc14e2dfcf 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c@@ -407,12 +407,11 @@ void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx,for (i = 0; i < adev->num_rings; i++) { ring = adev->rings[i]; entity = &ctx->rings[i].entity; - rq = &ring->sched.sched_rq[ctx_prio]; if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ) continue; - drm_sched_entity_set_rq(entity, rq); + drm_sched_entity_set_priority(entity, ctx_prio); } } diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c b/drivers/gpu/drm/scheduler/gpu_scheduler.c index 05dc6ecd4003..85908c7f913e 100644 --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c@@ -419,29 +419,39 @@ static void drm_sched_entity_clear_dep(structdma_fence *f, struct dma_fence_cb } /** - * drm_sched_entity_set_rq - Sets the run queue for an entity+ * drm_sched_entity_set_rq_priority - helper for drm_sched_entity_set_priority+ */+static void drm_sched_entity_set_rq_priority(struct drm_sched_rq **rq,+ enum drm_sched_priority priority) +{ + *rq = &(*rq)->sched->sched_rq[priority]; +} + +/** + * drm_sched_entity_set_priority - Sets priority of the entity * * @entity: scheduler entity - * @rq: scheduler run queue + * @priority: scheduler priority *- * Sets the run queue for an entity and removes the entity from the previous- * run queue in which was present. + * Update the priority of runqueus used for the entity. */ -void drm_sched_entity_set_rq(struct drm_sched_entity *entity, - struct drm_sched_rq *rq)+void drm_sched_entity_set_priority(struct drm_sched_entity *entity,+ enum drm_sched_priority priority) { - if (entity->rq == rq) - return; - - BUG_ON(!rq); + unsigned int i; spin_lock(&entity->rq_lock); + + for (i = 0; i < entity->num_rq_list; ++i)+ drm_sched_entity_set_rq_priority(&entity->rq_list[i], priority);+ drm_sched_rq_remove_entity(entity->rq, entity); - entity->rq = rq; - drm_sched_rq_add_entity(rq, entity); + drm_sched_entity_set_rq_priority(&entity->rq, priority); + drm_sched_rq_add_entity(entity->rq, entity); + spin_unlock(&entity->rq_lock); } -EXPORT_SYMBOL(drm_sched_entity_set_rq); +EXPORT_SYMBOL(drm_sched_entity_set_priority); /** * drm_sched_dependency_optimizeddiff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.hindex 0c4cfe689d4c..22c0f88f7d8f 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h@@ -298,9 +298,8 @@ void drm_sched_entity_fini(struct drm_sched_entity *entity);void drm_sched_entity_destroy(struct drm_sched_entity *entity); void drm_sched_entity_push_job(struct drm_sched_job *sched_job, struct drm_sched_entity *entity); -void drm_sched_entity_set_rq(struct drm_sched_entity *entity, - struct drm_sched_rq *rq); -+void drm_sched_entity_set_priority(struct drm_sched_entity *entity,+ enum drm_sched_priority priority); struct drm_sched_fence *drm_sched_fence_create( struct drm_sched_entity *s_entity, void *owner); void drm_sched_fence_scheduled(struct drm_sched_fence *fence);_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel