Re: [PATCH v8 1/1] selftests/x86/xstate: Add xstate test cases for XSAVE feature

Dave Hansen <dave.hansen@xxxxxxxxx> · Mon, 11 Apr 2022 07:42:53 -0700

On 4/11/22 03:14, Pengfei Xu wrote:
> On 2022-04-08 at 09:58:50 -0700, Dave Hansen wrote:
>> On 3/16/22 05:40, Pengfei Xu wrote:
>>> +#ifndef __cpuid_count
>>> +#define __cpuid_count(level, count, a, b, c, d) ({	\
>>> +	__asm__ __volatile__ ("cpuid\n\t"	\
>>> +			: "=a" (a), "=b" (b), "=c" (c), "=d" (d)	\
>>> +			: "0" (level), "2" (count));	\
>>> +})
>>> +#endif
>>
>>
>> By the time you post the next revision, I hope these are merged:
>>
>>> https://lore.kernel.org/all/cover.1647360971.git.reinette.chatre@xxxxxxxxx/
>>
>> Could you rebase on top of those to avoid duplication, please?
>>
>   Yes, thanks for remind. I would like to use the helper __cpuid_count() based
>   on above fixed patch.
>   And I have a concern with stdlib.h with "-mno-sse" issue, it's a GCC bug,
>   and it will be fixed in GCC11. And I could not use kselftest.h, because
>   kselftest.h used stdlib.h also...

Ugh.  What a mess.

>   Thanks for the link! From above "id=27600"link, Lu Hongjiu mentioned that:
>   It's a "GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652";
>   And id=99652 shows that it's fixed in GCC 11.
>   I tried "-mgeneral-regs-only"(it includes "-mno-sse"), it also gcc failed
>   with same error as "-mno-sse":
>   "
> /usr/include/bits/stdlib-float.h: In function ‘atof’:
> /usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled
>   "
>   And the error shows that: it's related to "stdlib-float.h", seems I didn't
>   use stdlib-float.h related function.
> 
>   In order for this test code to support GCC versions before GCC11, such as
>   GCC8, etc., I used this workaround "avoid stdlib.h" for GCC bug, and add
>   *aligned_alloc() declaration above.
> 
>   Because kselftest.h uses stdlib.h also, so I could not use kselftest.h with
>   "-mno-see" special GCC requirement, and seems I could not use __cpuid_count()
>   patch in kselftest.h from Reinette Chatre.
>   Thanks!

Can you break this test up into two pieces which are spread across two
.c files?  One that *just* does the sequences that need XSAVE and to
preserve FPU state with -mno-sse, and then a separate one that uses
stdlib.h and also the kselftest infrastructure?

For instance, all of the detection and enumeration can be in the nornal
.c file.

...
>>> +/* Fill FP/XMM/YMM/OPMASK and PKRU xstates into buffer. */
>>> +static void fill_xstates_buf(struct xsave_buffer *buf, uint32_t xsave_mask)
>>> +{
>>> +	uint32_t i;
>>> +	/* The data of FP x87 state are as follows. */
>>
>> OK, but what *is* this?  Random data?  Or did you put some data in that
>> means something?
>>
> I learned from filling the fp data by follow functions: fill FPU xstate
> by fldl instructions, push the source operand onto the FPU register stack
>  and recorded the fp xstate, then used buffer filling way
> to fill the fpu xstates:
> Some fp data could be set as random value under the "FP in SDM rules".
> Shall I add the comment for the fpu data filling like as follow:
> "
> /*
>  * The data of FP x87 state has the same effect as pushing source operand
>  * 0x1f2f3f4f1f2f3f4f onto the FPU stack ST0-7.
>  */
> unsigned char fp_data[160] = {...
> "

No, that's hideous.  If you generated the fp state with code, let's
include the *code*.

> As follow is the pushing source operand 0x1f2f3f4f1f2f3f4f onto the FPU stack
> ST0-7:
> "
> static inline void prepare_fp_buf(uint64_t ui64_fp)
> {
> 	/* Populate ui64_fp value onto FP registers stack ST0-7. */
> 	asm volatile("finit");
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> 	asm volatile("fldl %0" : : "m" (ui64_fp));
> }
> ...
> prepare_fp_buf(0x1f2f3f4f);
> __xsave(buf, xstate_info.mask);
> "
> 
> The code to set fp data and display xstate is as follow:
> https://github.com/xupengfe/xstate_show/blob/0411_fp/x86/xstate.c
> 
> I found there were 2 different:
>>> +	unsigned char fp_data[160] = {
>>> +		0x7f, 0x03, 0x00, 0x00, 0xff, 0x00, 0x00, 0x00,
>>> +		0xf0, 0x19, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00,
>             ^Above function shows "0xff, 0x12" not "0xf0, 0x19" in 0x8/0x9
> offset bytes:
> Bytes 11:8 are used for bits 31:0 of the x87 FPU Instruction Pointer
> Offset(FIP). It could be impacted if I added "vzeroall" and so on instructions,
> in order to match with above fpu function result, will change to "0xff, 0x12".
> 
>>> +		0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
>>> +		0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00,
>             ^Above function shows "0x80, 0x1f" not "0x00, 0x00" in offset
> 0x18/0x19 bytes:
> Bytes 27:24 are used for the MXCSR register. XRSTOR and XRSTORS generate
> general-protection faults (#GP) in response to attempts to set any of the
> reserved bits of the MXCSR register.
> It could be set as "0x00, 0x00", in order to match with result of above
> function, will fill as "0x80, 0x1f".

I'm totally lost with what you are trying to say here.

...
>> I have to wonder if we can do this in a bit more structured way:
>>
>> struct xstate_test
>> {
>> 	bool (*init)(void);
>> 	bool (*fill_state)(struct xsave_buffer *buf);
>> 	...
>> }
>>
>> Yes, that means indirect function calls, but that's OK since we know it
>> will all be compiled with the "no-sse" flag (and friends).
>>
>> Then fill_xstates_buf() just becomes a loop over an xstate_tests[] array.
>>
>   Yes, it's much better to fill the buf in a loop! Thanks!
>   Because it's special for pkru filling, so I will improve it like as follow:
> "
> 	uint32_t xfeature_num;
> ...
> 
> 	/* Fill test byte value into each tested xstate buffer(offset/size). */
> 	for (xfeature_num = XFEATURE_YMM; xfeature_num < XFEATURE_MAX;
> 			xfeature_num++) {
> 		if (xstate_tested(xfeature_num)) {
> 			if (xfeature_num == XFEATURE_PKRU) {
> 				/* Only 0-3 bytes of pkru xstates are allowed to be written. */
> 				memset((unsigned char *)buf + xstate_info.offset[XFEATURE_PKRU],
> 					PKRU_TESTBYTE, sizeof(uint32_t));
> 				continue;
> 			}
> 
> 			memset((unsigned char *)buf + xstate_info.offset[xfeature_num],
> 				XSTATE_TESTBYTE, xstate_info.size[xfeature_num]);
> 		}
> 	}
> "

I'm not sure that's an improvement.

>>> +/*
>>> + * Because xstate like XMM, YMM registers are not preserved across function
>>> + * calls, so use inline function with assembly code only to raise signal.
>>> + */
>>> +static inline long long __raise(long long pid_num, long long sig_num)
>>> +{
>>> +	long long ret, nr = SYS_kill;
>>> +
>>> +	register long long arg1 asm("rdi") = pid_num;
>>> +	register long long arg2 asm("rsi") = sig_num;
>>
>> I really don't like register variables.  They're very fragile.  Imagine
>> if someone put a printf() right here.  Why don't you just do this with
>> inline assembly directives?
>>
>   It seems that adding printf() to an inline function is not good.
>   I'm not sure if I understand correctly: should I use inline assembly to
>   assign variables to rdi, rsi and then syscall?
>   It it's right, I will think about how to implement it by inline assembly way
>   and fix it.

Yes, do it with inline assembly.