Re: [PATCH] drm/scheduler: fix race condition in load balancer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Christian,

On 1/14/20 5:01 PM, Christian König wrote:

Before this patch:

sched_name     num of many times it got scheduled
=========      ==================================
sdma0          314
sdma1          32
comp_1.0.0     56
comp_1.1.0     0
comp_1.1.1     0
comp_1.2.0     0
comp_1.2.1     0
comp_1.3.0     0
comp_1.3.1     0

After this patch:

sched_name     num of many times it got scheduled
=========      ==================================
  sdma1          243
  sdma0          164
  comp_1.0.1     14
  comp_1.1.0     11
  comp_1.1.1     10
  comp_1.2.0     15
  comp_1.2.1     14
  comp_1.3.0     10
  comp_1.3.1     10

Well that is still rather nice to have, why does that happen?

I think I know why it happens. At init all entity's rq gets assigned to sched_list[0]. I put some prints to check what we compare in drm_sched_entity_get_free_sched.

It turns out most of the time it compares zero values(num_jobs(0) < min_jobs(0)) so most of the time 1st rq(sdma0, comp_1.0.0) was picked by drm_sched_entity_get_free_sched.


This patch was not correct , had an extra atomic_inc(num_jobs) in drm_sched_job_init. This probably added bit of randomness I think, which helped in better job distribution.

I've updated my previous RFC patch which uses time consumed by each sched for load balance with a twist of ignoring previously scheduled sched/rq. Let me know what do you think.


Regards,

Nirmoy


Christian.


_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux