From: Peter Zijlstra [peterz@xxxxxxxxxxxxx]: >On Fri, Oct 03, 2014 at 08:17:30PM -0700, Leonid Yegoshin wrote: >> --- a/arch/mips/include/asm/switch_to.h > >+++ b/arch/mips/include/asm/switch_to.h >Why raw_smp_processor_id() and why evaluate it 3 times, sure compilers >can be expected to do some CSE but something like: > > int cpu = smp_processor_id(); > > if ( ... [cpu] ...) > >is far more readable as well. Sure. But may be it has sense to use raw_smp_processor_id() due to elevated preemption counter. >> + flush_vdso_page(); \ >So what I didn't see is any talk about the cost of this. Surely a TLB >flush isn't exactly free. Well, flush_vdso_page() uses a local version of TLB page flushing and it is cheap 'per se' in comparison with system-wide. And I take precautions to flush only if it matches the same memory map, so it is the situation then one pthread on some map is replaced by some pthread on the same map on the same CPU. So, it flushes only after real use in previous pthread of that map. However, some performance loss can be expected due to killing TLB. In low-end cores, with small TLB array we can expect that this TLB can be kicked off anyway after context switch. In high end cores we should expect FPU unit available and float point emulation can be very rare (un-normalized operations or so). The only question is a middle line which has enough TLB (32 or more) but may have no float point processor. However, the emulation itself is very slow and it is natural to expect performance degradation because of float point emulation here in much higher degree than possible penalty due to early loss of TLB element.