On 9/1/21 6:00 AM, Borislav Petkov wrote: >> So, this code takes the coward's way out: it *forces* TIF_NEED_FPU_LOAD >> to be clear by making the registers live with fpregs_restore_userregs(). >> That lets it just use WRMSR instead of dealing with the XSAVE buffer >> directly. If it didn't do this with the *WHOLE* set of user FPU state, >> we'd need more fine-granted "NEED_*_LOAD" tracking than our one FPU bit. >> >> This is also *only* safe because the task is newly-exec()'d and the FPU >> state was just reset. Otherwise, we might have had to worry that the >> non-PL3 SSPs have garbage or that non-SHSTK bits are set in MSR_IA32_U_CET. >> >> That said, after staring at it, I *think* this code is functionally >> correct and OK performance-wise. > Right, except that that is being done in > setup_signal_shadow_stack()/restore_signal_shadow_stack() too, for the > restore token. > > Which means, a potential XRSTOR each time just for a single MSR. That > means, twice per signal in the worst case. > > Which means, shadow stack should be pretty noticeable in signal-heavy > benchmarks... Ahh, good point. I totally missed the signal side. Yu-cheng, it would probably be wise to take a look at where TIF_NEED_FPU_LOAD is set in the signal paths. I suspect the new shadow stack code is clearing it pretty quickly. That's not *necessarily* a waste because it has to be XRSTOR'd somewhere before returning to userspace. But, it does impose an extra XSAVE/XRSTOR burden if the code is preempted somewhere between setup_signal_shadow_stack()/restore_signal_shadow_stack() and the return to usersspace. Some performance numbers would be nice. Even nicer would be to make your code work without requiring TIF_NEED_FPU_LOAD to be clear, or actively clearing it.