On 4/11/22 03:14, Pengfei Xu wrote: > On 2022-04-08 at 09:58:50 -0700, Dave Hansen wrote: >> On 3/16/22 05:40, Pengfei Xu wrote: >>> +#ifndef __cpuid_count >>> +#define __cpuid_count(level, count, a, b, c, d) ({ \ >>> + __asm__ __volatile__ ("cpuid\n\t" \ >>> + : "=a" (a), "=b" (b), "=c" (c), "=d" (d) \ >>> + : "0" (level), "2" (count)); \ >>> +}) >>> +#endif >> >> >> By the time you post the next revision, I hope these are merged: >> >>> https://lore.kernel.org/all/cover.1647360971.git.reinette.chatre@xxxxxxxxx/ >> >> Could you rebase on top of those to avoid duplication, please? >> > Yes, thanks for remind. I would like to use the helper __cpuid_count() based > on above fixed patch. > And I have a concern with stdlib.h with "-mno-sse" issue, it's a GCC bug, > and it will be fixed in GCC11. And I could not use kselftest.h, because > kselftest.h used stdlib.h also... Ugh. What a mess. > Thanks for the link! From above "id=27600"link, Lu Hongjiu mentioned that: > It's a "GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652" > And id=99652 shows that it's fixed in GCC 11. > I tried "-mgeneral-regs-only"(it includes "-mno-sse"), it also gcc failed > with same error as "-mno-sse": > " > /usr/include/bits/stdlib-float.h: In function ‘atof’: > /usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled > " > And the error shows that: it's related to "stdlib-float.h", seems I didn't > use stdlib-float.h related function. > > In order for this test code to support GCC versions before GCC11, such as > GCC8, etc., I used this workaround "avoid stdlib.h" for GCC bug, and add > *aligned_alloc() declaration above. > > Because kselftest.h uses stdlib.h also, so I could not use kselftest.h with > "-mno-see" special GCC requirement, and seems I could not use __cpuid_count() > patch in kselftest.h from Reinette Chatre. > Thanks! Can you break this test up into two pieces which are spread across two .c files? One that *just* does the sequences that need XSAVE and to preserve FPU state with -mno-sse, and then a separate one that uses stdlib.h and also the kselftest infrastructure? For instance, all of the detection and enumeration can be in the nornal .c file. ... >>> +/* Fill FP/XMM/YMM/OPMASK and PKRU xstates into buffer. */ >>> +static void fill_xstates_buf(struct xsave_buffer *buf, uint32_t xsave_mask) >>> +{ >>> + uint32_t i; >>> + /* The data of FP x87 state are as follows. */ >> >> OK, but what *is* this? Random data? Or did you put some data in that >> means something? >> > I learned from filling the fp data by follow functions: fill FPU xstate > by fldl instructions, push the source operand onto the FPU register stack > and recorded the fp xstate, then used buffer filling way > to fill the fpu xstates: > Some fp data could be set as random value under the "FP in SDM rules". > Shall I add the comment for the fpu data filling like as follow: > " > /* > * The data of FP x87 state has the same effect as pushing source operand > * 0x1f2f3f4f1f2f3f4f onto the FPU stack ST0-7. > */ > unsigned char fp_data[160] = {... > " No, that's hideous. If you generated the fp state with code, let's include the *code*. > As follow is the pushing source operand 0x1f2f3f4f1f2f3f4f onto the FPU stack > ST0-7: > " > static inline void prepare_fp_buf(uint64_t ui64_fp) > { > /* Populate ui64_fp value onto FP registers stack ST0-7. */ > asm volatile("finit"); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > asm volatile("fldl %0" : : "m" (ui64_fp)); > } > ... > prepare_fp_buf(0x1f2f3f4f); > __xsave(buf, xstate_info.mask); > " > > The code to set fp data and display xstate is as follow: > https://github.com/xupengfe/xstate_show/blob/0411_fp/x86/xstate.c > > I found there were 2 different: >>> + unsigned char fp_data[160] = { >>> + 0x7f, 0x03, 0x00, 0x00, 0xff, 0x00, 0x00, 0x00, >>> + 0xf0, 0x19, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, > ^Above function shows "0xff, 0x12" not "0xf0, 0x19" in 0x8/0x9 > offset bytes: > Bytes 11:8 are used for bits 31:0 of the x87 FPU Instruction Pointer > Offset(FIP). It could be impacted if I added "vzeroall" and so on instructions, > in order to match with above fpu function result, will change to "0xff, 0x12". > >>> + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, >>> + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, > ^Above function shows "0x80, 0x1f" not "0x00, 0x00" in offset > 0x18/0x19 bytes: > Bytes 27:24 are used for the MXCSR register. XRSTOR and XRSTORS generate > general-protection faults (#GP) in response to attempts to set any of the > reserved bits of the MXCSR register. > It could be set as "0x00, 0x00", in order to match with result of above > function, will fill as "0x80, 0x1f". I'm totally lost with what you are trying to say here. ... >> I have to wonder if we can do this in a bit more structured way: >> >> struct xstate_test >> { >> bool (*init)(void); >> bool (*fill_state)(struct xsave_buffer *buf); >> ... >> } >> >> Yes, that means indirect function calls, but that's OK since we know it >> will all be compiled with the "no-sse" flag (and friends). >> >> Then fill_xstates_buf() just becomes a loop over an xstate_tests[] array. >> > Yes, it's much better to fill the buf in a loop! Thanks! > Because it's special for pkru filling, so I will improve it like as follow: > " > uint32_t xfeature_num; > ... > > /* Fill test byte value into each tested xstate buffer(offset/size). */ > for (xfeature_num = XFEATURE_YMM; xfeature_num < XFEATURE_MAX; > xfeature_num++) { > if (xstate_tested(xfeature_num)) { > if (xfeature_num == XFEATURE_PKRU) { > /* Only 0-3 bytes of pkru xstates are allowed to be written. */ > memset((unsigned char *)buf + xstate_info.offset[XFEATURE_PKRU], > PKRU_TESTBYTE, sizeof(uint32_t)); > continue; > } > > memset((unsigned char *)buf + xstate_info.offset[xfeature_num], > XSTATE_TESTBYTE, xstate_info.size[xfeature_num]); > } > } > " I'm not sure that's an improvement. >>> +/* >>> + * Because xstate like XMM, YMM registers are not preserved across function >>> + * calls, so use inline function with assembly code only to raise signal. >>> + */ >>> +static inline long long __raise(long long pid_num, long long sig_num) >>> +{ >>> + long long ret, nr = SYS_kill; >>> + >>> + register long long arg1 asm("rdi") = pid_num; >>> + register long long arg2 asm("rsi") = sig_num; >> >> I really don't like register variables. They're very fragile. Imagine >> if someone put a printf() right here. Why don't you just do this with >> inline assembly directives? >> > It seems that adding printf() to an inline function is not good. > I'm not sure if I understand correctly: should I use inline assembly to > assign variables to rdi, rsi and then syscall? > It it's right, I will think about how to implement it by inline assembly way > and fix it. Yes, do it with inline assembly.