On Mon, Jan 17, 2022 at 1:19 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > On Thu, Jan 13, 2022 at 03:39:39PM -0800, Peter Oskolkov wrote: [...] > > > > So this change basically decouples block/wake detection from > > M:N threading in the sense that the number of servers is now > > does not have to be M or N, but is more driven by the scalability > > needs of the userspace application. > > So I don't object to having this blocking list, we had that early on in > the discussions. > > *However*, combined with WF_CURRENT_CPU this 1:N userspace model doesn't > really make sense, also combined with Proxy-Exec (if we ever get that > sorted) it will fundamentally not work. > > More consideration is needed I think... I was not very clear here. The intent of this change is not to make 1:N a good general approach, but to make "several running workers per single server" a viable option. My guess, based on some numbers/benchmarks from another project, is that having a single server/runqueue per four or eight running workers, properly aligned with (= affined to) an AMD chiplet, will be the most performant solution, comparing to both a runqueue per single running worker and to a global runqueue. On Intel this will probably look like a single runqueue per core (2 running workers/HT threads). So in this model a "server" represents a runqueue. I'll reply to other active umcg discussions shortly.