On Mon, 14 Nov 2022 at 17:42, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote: > > On Sat, 12 Nov 2022 at 03:51, Song Zhang <zhangsong34@xxxxxxxxxx> wrote: > > > > Hi, Vincent > > > > On 2022/11/3 17:22, Vincent Guittot wrote: > > > On Thu, 3 Nov 2022 at 10:20, Song Zhang <zhangsong34@xxxxxxxxxx> wrote: > > >> > > >> > > >> > > >> On 2022/11/3 16:33, Vincent Guittot wrote: > > >>> On Thu, 3 Nov 2022 at 04:01, Song Zhang <zhangsong34@xxxxxxxxxx> wrote: > > >>>> > > >>>> Thanks for your reply! > > >>>> > > >>>> On 2022/11/3 2:01, Vincent Guittot wrote: > > >>>>> On Wed, 2 Nov 2022 at 04:54, Song Zhang <zhangsong34@xxxxxxxxxx> wrote: > > >>>>>> > > >>>>> > > >>>>> This really looks like a v3 of > > >>>>> https://lore.kernel.org/all/20220810015636.3865248-1-zhangsong34@xxxxxxxxxx/ > > >>>>> > > >>>>> Please keep versioning. > > >>>>> > > >>>>>> Add a new sysctl interface: > > >>>>>> /proc/sys/kernel/sched_prio_load_balance_enabled > > >>>>> > > >>>>> We don't want to add more sysctl knobs for the scheduler, we even > > >>>>> removed some. Knob usually means that you want to fix your use case > > >>>>> but the solution doesn't make sense for all cases. > > >>>>> > > >>>> > > >>>> OK, I will remove this knobs later. > > >>>> > > >>>>>> > > >>>>>> 0: default behavior > > >>>>>> 1: enable priority load balance for CFS > > >>>>>> > > >>>>>> For co-location with idle and non-idle tasks, when CFS do load balance, > > >>>>>> it is reasonable to prefer migrating non-idle tasks and migrating idle > > >>>>>> tasks lastly. This will reduce the interference by SCHED_IDLE tasks > > >>>>>> as much as possible. > > >>>>> > > >>>>> I don't agree that it's always the best choice to migrate a non-idle task 1st. > > >>>>> > > >>>>> CPU0 has 1 non idle task and CPU1 has 1 non idle task and hundreds of > > >>>>> idle task and there is an imbalance between the 2 CPUS: migrating the > > >>>>> non idle task from CPU1 to CPU0 is not the best choice > > >>>>> > > >>>> > > >>>> If the non idle task on CPU1 is running or cache hot, it cannot be > > >>>> migrated and idle tasks can also be migrated from CPU1 to CPU0. So I > > >>>> think it does not matter. > > >>> > > >>> What I mean is that migrating non idle tasks first is not a universal > > >>> win and not always what we want. > > >>> > > >> > > >> But migrating online tasks first is mostly a trade-off that > > >> non-idle(Latency Sensitive) tasks can obtain more CPU time and minimize > > >> the interference caused by IDLE tasks. I think this makes sense in most > > >> cases, or you can point out what else I need to think about it ? > > >> > > >> Best regards. > > >> > > >>>> > > >>>>>> > > >>>>>> Testcase: > > >>>>>> - Spawn large number of idle(SCHED_IDLE) tasks occupy CPUs > > >>>>> > > >>>>> What do you mean by a large number ? > > >>>>> > > >>>>>> - Let non-idle tasks compete with idle tasks for CPU time. > > >>>>>> > > >>>>>> Using schbench to test non-idle tasks latency: > > >>>>>> $ ./schbench -m 1 -t 10 -r 30 -R 200 > > >>>>> > > >>>>> How many CPUs do you have ? > > >>>>> > > >>>> > > >>>> OK, some details may not be mentioned. > > >>>> My virtual machine has 8 CPUs running with a schbench process and 5000 > > >>>> idle tasks. The idle task is a while dead loop process below: > > >>> > > >>> How can you care about latency when you start 10 workers on 8 vCPUs > > >>> with 5000 non idle threads ? > > >>> > > >> > > >> No no no... spawn 5000 idle(SCHED_IDLE) processes not 5000 non-idle > > >> threads, and with 10 non-idle schbench workers on 8 vCPUs. > > > > > > yes spawn 5000 idle tasks but my point remains the same > > > > > > > I am so sorry that I have not received your reply for a long time, and I > > am still waiting for it anxiously. In fact, migrating non-idle tasks 1st > > works well in most scenarios, so it maybe possible to add a > > sched_feat(LB_PRIO) to enable or disable that. Finally, I really hope > > you can give me some better advice. > > I have seen that you posted a v4 5 days ago which is on my list to be reviewed. > > My concern here remains that selecting non idle task 1st is not always > the best choices as for example when you have 1 non idle task per cpu > and thousands of idle tasks moving around. Then regarding your use > case, the weight of the 5000 idle threads is around twice more than > the weight of your non idle bench: sum weight of idle threads is 15k > whereas the weight of your bench is around 6k IIUC how RPS run. This > also means that the idle threads will take a significant times of the > system: 5000 / 7000 ticks. I don't understand how you can care about > latency in such extreme case and I'm interested to get the real use > case where you can have such situation. > > All that to say that idle task remains cfs task with a small but not > null weight and we should not make them special other than by not > preempting at wakeup. Also, as mentioned for a previous version, a task with nice prio 19 has a weight of 15 so if you replace the 5k idle threads with 1k cfs w/ nice prio 19 threads, you will face a similar problem. So you can't really care only on the idle property of a task > > > > > Best regards. > > > > Song Zhang