> > > > A little complex for the whole thing. > > on 4 sockets EX machine, 3~5% hackbench thread regression due to 4dcfe > > can be recovered by ab2789. > > > > But on 2 sockets SNB machine, 1024 clients loop netperf TCP-RR has about > > 9% regression. and your patch seem recover 2~3%. > > > > And on a 2 sockets nhm, one of our private benchmark was impact much 20 > > +% regression. that benchmark just run 4 process, each of process open a > > thread, and the thread tasks is to locate randomly pages and than read > > from 4 times/write 1 time data into a page. The ab2789 commit seems no > > help our benchmark. In fact, since L1 cache, D-TLB is shared between SMT siblings, it is not every application can benefit from wakeup on different Cores. > > Ok. Can you please try couple of experiments with two kernels? Two > kernels being the base kernel (prior to 4dcfe1025b513c2c) and the second > kernel with the commit ab2789213d224202237292d78aaa0c386c7b28b2. Testing started on impacted NHM and SND machines for benchmark netperf hackbench thread and our small case. > > One experiment with p-states turned off and the second experiment with > c-states turned off. > > I suspect mostly deeper core c-states might be contributing to the > behavior that you are seeing. ��.n��������+%������w��{.n�����{��ة��)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥