> -----Original Message----- > From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx] > Sent: Wednesday, January 13, 2021 12:00 AM > To: Morten Rasmussen <morten.rasmussen@xxxxxxx>; Tim Chen > <tim.c.chen@xxxxxxxxxxxxxxx> > Cc: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; > valentin.schneider@xxxxxxx; catalin.marinas@xxxxxxx; will@xxxxxxxxxx; > rjw@xxxxxxxxxxxxx; vincent.guittot@xxxxxxxxxx; lenb@xxxxxxxxxx; > gregkh@xxxxxxxxxxxxxxxxxxx; Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>; > mingo@xxxxxxxxxx; peterz@xxxxxxxxxxxxx; juri.lelli@xxxxxxxxxx; > rostedt@xxxxxxxxxxx; bsegall@xxxxxxxxxx; mgorman@xxxxxxx; > mark.rutland@xxxxxxx; sudeep.holla@xxxxxxx; aubrey.li@xxxxxxxxxxxxxxx; > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-acpi@xxxxxxxxxxxxxxx; linuxarm@xxxxxxxxxxxxx; xuwei (O) > <xuwei5@xxxxxxxxxx>; Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; tiantao (H) > <tiantao6@xxxxxxxxxxxxx> > Subject: Re: [RFC PATCH v3 0/2] scheduler: expose the topology of clusters and > add cluster scheduler > > On 11/01/2021 10:28, Morten Rasmussen wrote: > > On Fri, Jan 08, 2021 at 12:22:41PM -0800, Tim Chen wrote: > >> > >> > >> On 1/8/21 7:12 AM, Morten Rasmussen wrote: > >>> On Thu, Jan 07, 2021 at 03:16:47PM -0800, Tim Chen wrote: > >>>> On 1/6/21 12:30 AM, Barry Song wrote: > > [...] > > >> I think it is going to depend on the workload. If there are dependent > >> tasks that communicate with one another, putting them together > >> in the same cluster will be the right thing to do to reduce communication > >> costs. On the other hand, if the tasks are independent, putting them together > on the same cluster > >> will increase resource contention and spreading them out will be better. > > > > Agree. That is exactly where I'm coming from. This is all about the task > > placement policy. We generally tend to spread tasks to avoid resource > > contention, SMT and caches, which seems to be what you are proposing to > > extend. I think that makes sense given it can produce significant > > benefits. > > > >> > >> Any thoughts on what is the right clustering "tag" to use to clump > >> related tasks together? > >> Cgroup? Pid? Tasks with same mm? > > > > I think this is the real question. I think the closest thing we have at > > the moment is the wakee/waker flip heuristic. This seems to be related. > > Perhaps the wake_affine tricks can serve as starting point? > > wake_wide() switches between packing (select_idle_sibling(), llc_size > CPUs) and spreading (find_idlest_cpu(), all CPUs). > > AFAICS, since none of the sched domains set SD_BALANCE_WAKE, currently > all wakeups are (llc-)packed. Sorry for late response. I was struggling with some other topology issues recently. For "all wakeups are (llc-)packed", it seems you mean current want_affine is only affecting the new_cpu, and for wake-up path, we will always go to select_idle_sibling() rather than find_idlest_cpu() since nobody sets SD_WAKE_BALANCE in any sched_domain ? > > select_task_rq_fair() > > for_each_domain(cpu, tmp) > > if (tmp->flags & sd_flag) > sd = tmp; > > > In case we would like to further distinguish between llc-packing and > even narrower (cluster or MC-L2)-packing, we would introduce a 2. level > packing vs. spreading heuristic further down in sis(). I didn't get your point on "2 level packing". Would you like to describe more? It seems you mean we need to have separate calculation for avg_scan_cost and sched_feat(SIS_) for cluster (or MC-L2) since cluster and llc are not in the same level physically? > > IMHO, Barry's current implementation doesn't do this right now. Instead > he's trying to pack on cluster first and if not successful look further > among the remaining llc CPUs for an idle CPU. Yes. That is exactly what the current patch is doing. Thanks Barry