On Wed, Oct 21, 2020 at 03:48:59PM +0200, Julia Lawall wrote: > > I worry it's overkill because prev is always used if it is idle even > > if it is on a node remote to the waker. It cuts off the option of a > > wakee moving to a CPU local to the waker which is not equivalent to the > > original behaviour. > > Could it be possible to check p->recent_used_cpu? If that is prev (or on > the same socket?), then prev could be a good choice. If that is on the > same socket as the waker, then maybe the waker would be better. > It is an interesting idea but the treatment of p->recent_used_cpu is a bit strange though. At one point, I looked at reconciling all the select CPU paths into one single pass (prototype worked back in 5.6-era but was not a universal win). As part of that, I remembered that the setting of recent_used_cpu is a problem because it can be set to the wakers recent_used_cpu by this chunk if (want_affine) current->recent_used_cpu = cpu; Fixing that on its own causes other problems. From an unposted series that "fixed it" as the last part of 14 patch series; After select_idle_sibling, the *wakers* recent_used_cpu is set to the CPU used for the wakee. This was necessary at the time as without it, the miss rate was unacceptably high. It still works, but it is less efficient than it can be. This patch leaves the waker state alone and uses either the previous CPU on a hit or the target CPU on a miss when the recently used CPU is examined for idleness. The rest of the series is important as without it, this patch improves the number of hits but the miss rate gets ridiculously high. After the rest of the series, the hit and miss rates are both higher but the miss rate is acceptable. As the hunk means that recent_used_cpu could have been set based on a previous wakeup, it makes it unreliable for making cross-node decisions. p->recent_used_cpu's primary purpose at the moment is to avoid select_idle_sibling searches. -- Mel Gorman SUSE Labs