On 11/08/16 10:39, Kyle Huey wrote: > } > > + if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^ > + test_tsk_thread_flag(next_p, TIF_NOCPUID)) { > + set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID)); > + } > + > if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^ > test_tsk_thread_flag(next_p, TIF_NOTSC)) { > /* prev and next are different */ > if (test_tsk_thread_flag(next_p, TIF_NOTSC)) > hard_disable_TSC(); > else > hard_enable_TSC(); > } I'm unhappy about this part: we already do two XORs on these after bit extraction, which is quite inefficient; and at least theoretically we could be indirecting though the ->stack pointer for every one if gcc can't tell it won't have changed (we really need to get thread_info moved into the task_struct allocation and away from the kernel stack, especially since on x86 the pointer is the same size as the vestigial structure it points to.) It would be so much saner to do one xor and then go onto a common slow path: struct thread_info *prev_ti = task_thread_info(prev_p); struct thread_info *next_ti = task_thread_info(next_p); tif_flipped = prev_ti->flags ^ next_ti->flags; if (unlikely(tif_flipped & (_TIF_BLOCKSTEP | _TIF_NOTSC | _TIF_NOCPUID))) { if (tif_flipped & _TIF_BLOCKSTEP) { ... } if (tif_flipped & _TIF_NOTSC) { ... } if (tif_flipped & _TIF_NOCPUID) { ... } } Then we can also replace test_tsk_thread_flag() with test_ti_thread_flag() in other places in this function. -hpa