Re: [PATCH v10 6/7] x86/arch_prctl: Add ARCH_[GET|SET]_CPUID

"H. Peter Anvin" <hpa@xxxxxxxxx> · Tue, 14 Mar 2017 12:01:27 -0700

On 11/08/16 10:39, Kyle Huey wrote:
>  	}
>  
> +	if (test_tsk_thread_flag(prev_p, TIF_NOCPUID) ^
> +	    test_tsk_thread_flag(next_p, TIF_NOCPUID)) {
> +		set_cpuid_faulting(test_tsk_thread_flag(next_p, TIF_NOCPUID));
> +	}
> +
>  	if (test_tsk_thread_flag(prev_p, TIF_NOTSC) ^
>  	    test_tsk_thread_flag(next_p, TIF_NOTSC)) {
>  		/* prev and next are different */
>  		if (test_tsk_thread_flag(next_p, TIF_NOTSC))
>  			hard_disable_TSC();
>  		else
>  			hard_enable_TSC();
>  	}

I'm unhappy about this part: we already do two XORs on these after bit
extraction, which is quite inefficient; and at least theoretically we
could be indirecting though the ->stack pointer for every one if gcc
can't tell it won't have changed (we really need to get thread_info
moved into the task_struct allocation and away from the kernel stack,
especially since on x86 the pointer is the same size as the vestigial
structure it points to.)

It would be so much saner to do one xor and then go onto a common slow path:

	struct thread_info *prev_ti = task_thread_info(prev_p);
	struct thread_info *next_ti = task_thread_info(next_p);

	tif_flipped = prev_ti->flags ^ next_ti->flags;

	if (unlikely(tif_flipped &
		(_TIF_BLOCKSTEP | _TIF_NOTSC | _TIF_NOCPUID))) {
		if (tif_flipped & _TIF_BLOCKSTEP) {
			...
		}
		if (tif_flipped & _TIF_NOTSC) {
			...
		}
		if (tif_flipped & _TIF_NOCPUID) {
			...
		}
	}

Then we can also replace test_tsk_thread_flag() with
test_ti_thread_flag() in other places in this function.

	-hpa