On 2022-04-11 at 07:42:53 -0700, Dave Hansen wrote: > On 4/11/22 03:14, Pengfei Xu wrote: > > On 2022-04-08 at 09:58:50 -0700, Dave Hansen wrote: > >> On 3/16/22 05:40, Pengfei Xu wrote: > >>> +#ifndef __cpuid_count > >>> +#define __cpuid_count(level, count, a, b, c, d) ({ \ > >>> + __asm__ __volatile__ ("cpuid\n\t" \ > >>> + : "=a" (a), "=b" (b), "=c" (c), "=d" (d) \ > >>> + : "0" (level), "2" (count)); \ > >>> +}) > >>> +#endif > >> > >> > >> By the time you post the next revision, I hope these are merged: > >> > >>> https://lore.kernel.org/all/cover.1647360971.git.reinette.chatre@xxxxxxxxx/ > >> > >> Could you rebase on top of those to avoid duplication, please? > >> > > Yes, thanks for remind. I would like to use the helper __cpuid_count() based > > on above fixed patch. > > And I have a concern with stdlib.h with "-mno-sse" issue, it's a GCC bug, > > and it will be fixed in GCC11. And I could not use kselftest.h, because > > kselftest.h used stdlib.h also... > > Ugh. What a mess. > I found the solution you mentioned as follow. I will try to fix it. Thanks! > > Thanks for the link! From above "id=27600"link, Lu Hongjiu mentioned that: > > It's a "GCC bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99652" > > And id=99652 shows that it's fixed in GCC 11. > > I tried "-mgeneral-regs-only"(it includes "-mno-sse"), it also gcc failed > > with same error as "-mno-sse": > > " > > /usr/include/bits/stdlib-float.h: In function ‘atof’: > > /usr/include/bits/stdlib-float.h:26:1: error: SSE register return with SSE disabled > > " > > And the error shows that: it's related to "stdlib-float.h", seems I didn't > > use stdlib-float.h related function. > > > > In order for this test code to support GCC versions before GCC11, such as > > GCC8, etc., I used this workaround "avoid stdlib.h" for GCC bug, and add > > *aligned_alloc() declaration above. > > > > Because kselftest.h uses stdlib.h also, so I could not use kselftest.h with > > "-mno-see" special GCC requirement, and seems I could not use __cpuid_count() > > patch in kselftest.h from Reinette Chatre. > > Thanks! > > Can you break this test up into two pieces which are spread across two > .c files? One that *just* does the sequences that need XSAVE and to > preserve FPU state with -mno-sse, and then a separate one that uses > stdlib.h and also the kselftest infrastructure? > > For instance, all of the detection and enumeration can be in the nornal > .c file. > Thanks a lot for suggestion! I will put normal function in "prepare_xstate.c", and there is a "prepare_xstate.h" also. Only keep _xsave, _xrstor, key test function in xstate.c. gcc -xx(some other parm) -c prepare_xstate.c gcc -xx -mno-sse xstate.c prepare_xstate.o > ... > >>> +/* Fill FP/XMM/YMM/OPMASK and PKRU xstates into buffer. */ > >>> +static void fill_xstates_buf(struct xsave_buffer *buf, uint32_t xsave_mask) > >>> +{ > >>> + uint32_t i; > >>> + /* The data of FP x87 state are as follows. */ > >> > >> OK, but what *is* this? Random data? Or did you put some data in that > >> means something? > >> > > I learned from filling the fp data by follow functions: fill FPU xstate > > by fldl instructions, push the source operand onto the FPU register stack > > and recorded the fp xstate, then used buffer filling way > > to fill the fpu xstates: > > Some fp data could be set as random value under the "FP in SDM rules". > > Shall I add the comment for the fpu data filling like as follow: > > " > > /* > > * The data of FP x87 state has the same effect as pushing source operand > > * 0x1f2f3f4f1f2f3f4f onto the FPU stack ST0-7. > > */ > > unsigned char fp_data[160] = {... > > " > > No, that's hideous. If you generated the fp state with code, let's > include the *code*. > Thanks for suggestion. Yes, I could use fp instructions back and xsave inline function to xsave fp xstate into buf. I will think about it. Thanks! > > As follow is the pushing source operand 0x1f2f3f4f1f2f3f4f onto the FPU stack > > ST0-7: > > " > > static inline void prepare_fp_buf(uint64_t ui64_fp) > > { > > /* Populate ui64_fp value onto FP registers stack ST0-7. */ > > asm volatile("finit"); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > asm volatile("fldl %0" : : "m" (ui64_fp)); > > } > > ... > > prepare_fp_buf(0x1f2f3f4f); > > __xsave(buf, xstate_info.mask); > > " > > > > The code to set fp data and display xstate is as follow: > > https://github.com/xupengfe/xstate_show/blob/0411_fp/x86/xstate.c > > > > I found there were 2 different: > >>> + unsigned char fp_data[160] = { > >>> + 0x7f, 0x03, 0x00, 0x00, 0xff, 0x00, 0x00, 0x00, > >>> + 0xf0, 0x19, 0x40, 0x00, 0x00, 0x00, 0x00, 0x00, > > ^Above function shows "0xff, 0x12" not "0xf0, 0x19" in 0x8/0x9 > > offset bytes: > > Bytes 11:8 are used for bits 31:0 of the x87 FPU Instruction Pointer > > Offset(FIP). It could be impacted if I added "vzeroall" and so on instructions, > > in order to match with above fpu function result, will change to "0xff, 0x12". > > > >>> + 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, > >>> + 0x00, 0x00, 0x00, 0x00, 0xff, 0xff, 0x00, 0x00, > > ^Above function shows "0x80, 0x1f" not "0x00, 0x00" in offset > > 0x18/0x19 bytes: > > Bytes 27:24 are used for the MXCSR register. XRSTOR and XRSTORS generate > > general-protection faults (#GP) in response to attempts to set any of the > > reserved bits of the MXCSR register. > > It could be set as "0x00, 0x00", in order to match with result of above > > function, will fill as "0x80, 0x1f". > > I'm totally lost with what you are trying to say here. > I used above "finit, fldl" instructions way to populate fp xstates as before. Sorry for unclear description. > ... > >> I have to wonder if we can do this in a bit more structured way: > >> > >> struct xstate_test > >> { > >> bool (*init)(void); > >> bool (*fill_state)(struct xsave_buffer *buf); > >> ... > >> } > >> > >> Yes, that means indirect function calls, but that's OK since we know it > >> will all be compiled with the "no-sse" flag (and friends). > >> > >> Then fill_xstates_buf() just becomes a loop over an xstate_tests[] array. > >> > > Yes, it's much better to fill the buf in a loop! Thanks! > > Because it's special for pkru filling, so I will improve it like as follow: > > " > > uint32_t xfeature_num; > > ... > > > > /* Fill test byte value into each tested xstate buffer(offset/size). */ > > for (xfeature_num = XFEATURE_YMM; xfeature_num < XFEATURE_MAX; > > xfeature_num++) { > > if (xstate_tested(xfeature_num)) { > > if (xfeature_num == XFEATURE_PKRU) { > > /* Only 0-3 bytes of pkru xstates are allowed to be written. */ > > memset((unsigned char *)buf + xstate_info.offset[XFEATURE_PKRU], > > PKRU_TESTBYTE, sizeof(uint32_t)); > > continue; > > } > > > > memset((unsigned char *)buf + xstate_info.offset[xfeature_num], > > XSTATE_TESTBYTE, xstate_info.size[xfeature_num]); > > } > > } > > " > > I'm not sure that's an improvement. > Ok. Except FP and xstate_bv, this loop will fill each tested value into xstate buffer. Thanks! > >>> +/* > >>> + * Because xstate like XMM, YMM registers are not preserved across function > >>> + * calls, so use inline function with assembly code only to raise signal. > >>> + */ > >>> +static inline long long __raise(long long pid_num, long long sig_num) > >>> +{ > >>> + long long ret, nr = SYS_kill; > >>> + > >>> + register long long arg1 asm("rdi") = pid_num; > >>> + register long long arg2 asm("rsi") = sig_num; > >> > >> I really don't like register variables. They're very fragile. Imagine > >> if someone put a printf() right here. Why don't you just do this with > >> inline assembly directives? > >> > > It seems that adding printf() to an inline function is not good. > > I'm not sure if I understand correctly: should I use inline assembly to > > assign variables to rdi, rsi and then syscall? > > It it's right, I will think about how to implement it by inline assembly way > > and fix it. > > Yes, do it with inline assembly. > I will move value into rdi rsi register as inline assembly like follow: asm volatile("movq %0, %%rdi" : : "r"(pid_num) : "%rdi"); Thanks! --Pengfei