On Tue, Mar 14, 2017 at 12:01 PM, H. Peter Anvin <hpa@xxxxxxxxx> wrote: > On 11/08/16 10:39, Kyle Huey wrote: >> } >> >> + if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^ >> + test_tsk_thread_flag(next_p, TIF_NOCPUID)) { >> + set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID)); >> + } >> + >> if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^ >> test_tsk_thread_flag(next_p, TIF_NOTSC)) { >> /* prev and next are different */ >> if (test_tsk_thread_flag(next_p, TIF_NOTSC)) >> hard_disable_TSC(); >> else >> hard_enable_TSC(); >> } > > I'm unhappy about this part: we already do two XORs on these after bit > extraction, which is quite inefficient; and at least theoretically we > could be indirecting though the ->stack pointer for every one if gcc > can't tell it won't have changed (we really need to get thread_info > moved into the task_struct allocation and away from the kernel stack, > especially since on x86 the pointer is the same size as the vestigial > structure it points to.) > > It would be so much saner to do one xor and then go onto a common slow path: > > struct thread_info *prev_ti = task_thread_info(prev_p); > struct thread_info *next_ti = task_thread_info(next_p); > > tif_flipped = prev_ti->flags ^ next_ti->flags; > > if (unlikely(tif_flipped & > (_TIF_BLOCKSTEP | _TIF_NOTSC | _TIF_NOCPUID))) { > if (tif_flipped & _TIF_BLOCKSTEP) { > ... > } > if (tif_flipped & _TIF_NOTSC) { > ... > } > if (tif_flipped & _TIF_NOCPUID) { > ... > } > } > > Then we can also replace test_tsk_thread_flag() with > test_ti_thread_flag() in other places in this function. That's largely what we ended up doing. See https://lkml.org/lkml/2017/2/14/80 and the latest version of this patch, https://lkml.org/lkml/2017/3/11/197. - Kyle