> -----Original Message----- > From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx] > Sent: Thursday, May 6, 2021 12:30 AM > To: Song Bao Hua (Barry Song) <song.bao.hua@xxxxxxxxxxxxx>; Vincent Guittot > <vincent.guittot@xxxxxxxxxx> > Cc: tim.c.chen@xxxxxxxxxxxxxxx; catalin.marinas@xxxxxxx; will@xxxxxxxxxx; > rjw@xxxxxxxxxxxxx; bp@xxxxxxxxx; tglx@xxxxxxxxxxxxx; mingo@xxxxxxxxxx; > lenb@xxxxxxxxxx; peterz@xxxxxxxxxxxxx; rostedt@xxxxxxxxxxx; > bsegall@xxxxxxxxxx; mgorman@xxxxxxx; msys.mizuma@xxxxxxxxx; > valentin.schneider@xxxxxxx; gregkh@xxxxxxxxxxxxxxxxxxx; Jonathan Cameron > <jonathan.cameron@xxxxxxxxxx>; juri.lelli@xxxxxxxxxx; mark.rutland@xxxxxxx; > sudeep.holla@xxxxxxx; aubrey.li@xxxxxxxxxxxxxxx; > linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > linux-acpi@xxxxxxxxxxxxxxx; x86@xxxxxxxxxx; xuwei (O) <xuwei5@xxxxxxxxxx>; > Zengtao (B) <prime.zeng@xxxxxxxxxxxxx>; guodong.xu@xxxxxxxxxx; yangyicong > <yangyicong@xxxxxxxxxx>; Liguozhu (Kenneth) <liguozhu@xxxxxxxxxxxxx>; > linuxarm@xxxxxxxxxxxxx; hpa@xxxxxxxxx > Subject: Re: [RFC PATCH v6 3/4] scheduler: scan idle cpu in cluster for tasks > within one LLC > > On 03/05/2021 13:35, Song Bao Hua (Barry Song) wrote: > > [...] > > >> From: Song Bao Hua (Barry Song) > > [...] > > >>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx] > > [...] > > >>> On 29/04/2021 00:41, Song Bao Hua (Barry Song) wrote: > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx] > >>> > >>> [...] > >>> > >>>>>>>> From: Dietmar Eggemann [mailto:dietmar.eggemann@xxxxxxx] > >>>>> > >>>>> [...] > >>>>> > >>>>>>>> On 20/04/2021 02:18, Barry Song wrote: > > [...] > > > > > On the other hand, according to "sched: Implement smarter wake-affine logic" > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=62470419 > > > > Proper factor in wake_wide is mainly beneficial of 1:n tasks like > postgresql/pgbench. > > So using the smaller cluster size as factor might help make wake_affine false > so > > improve pgbench. > > > > From the commit log, while clients = 2*cpus, the commit made the biggest > > improvement. In my case, It should be clients=48 for a machine whose LLC > > size is 24. > > > > In Linux, I created a 240MB database and ran "pgbench -c 48 -S -T 20 pgbench" > > under two different scenarios: > > 1. page cache always hit, so no real I/O for database read > > 2. echo 3 > /proc/sys/vm/drop_caches > > > > For case 1, using cluster_size and using llc_size will result in similar > > tps= ~108000, all of 24 cpus have 100% cpu utilization. > > > > For case 2, using llc_size still shows better performance. > > > > tps for each test round(cluster size as factor in wake_wide): > > 1398.450887 1275.020401 1632.542437 1412.241627 1611.095692 1381.354294 > 1539.877146 > > avg tps = 1464 > > > > tps for each test round(llc size as factor in wake_wide): > > 1718.402983 1443.169823 1502.353823 1607.415861 1597.396924 1745.651814 > 1876.802168 > > avg tps = 1641 (+12%) > > > > so it seems using cluster_size as factor in "slave >= factor && master >= > slave * > > factor" isn't a good choice for my machine at least. > > So SD size = 4 (instead of 24) seems to be too small for `-c 48`. > > Just curious, have you seen the benefit of using wake wide on SD size = > 24 (LLC) compared to not using it at all? At least in my benchmark made today, I have not seen any benefit to use llc_size. Always returning 0 in wake_wide() seems to be much better. postgres@ubuntu:$pgbench -i pgbench postgres@pgbench:$ pgbench -T 120 -c 48 pgbench using llc_size, it got to 123tps always returning 0 in wake_wide(), it got to 158tps actually, I really couldn't reproduce the performance improvement the commit "sched: Implement smarter wake-affine logic" mentioned. on the other hand, the commit log didn't present the pgbench command parameter used. I guess the benchmark result will highly depend on the command parameter and disk I/O speed. Thanks Barry