> On Oct 1, 2021, at 2:29 PM, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > On Fri, Oct 01 2021 at 08:13, Andy Lutomirski wrote: > >>> On Fri, Oct 1, 2021, at 2:56 AM, Thomas Gleixner wrote: >>> On Thu, Sep 30 2021 at 21:41, Andy Lutomirski wrote: >>>>> On Thu, Sep 30, 2021, at 5:01 PM, Thomas Gleixner wrote: >>> >>>> Now that I read the docs some more, I'm seriously concerned about this >>>> XSAVE design. XSAVES with UINTR is destructive -- it clears UINV. If >>>> we actually use this, then the whole last_cpu "preserve the state in >>>> registers" optimization goes out the window. So does anything that >>>> happens to assume that merely saving the state doesn't destroy it on >>>> respectable modern CPUs XRSTORS will #GP if you XRSTORS twice, which >>>> makes me nervous and would need a serious audit of our XRSTORS paths. >>> >>> I have no idea what you are fantasizing about. You can XRSTORS five >>> times in a row as long as your XSTATE memory image is correct. >> >> I'm just reading TFM, which is some kind of dystopian fantasy. >> >> 11.8.2.4 XRSTORS >> >> Before restoring the user-interrupt state component, XRSTORS verifies >> that UINV is 0. If it is not, XRSTORS causes a general-protection >> fault (#GP) before loading any part of the user-interrupt state >> component. (UINV is IA32_UINTR_MISC[39:32]; XRSTORS does not check the >> contents of the remainder of that MSR.) > > Duh. I was staring at the SDM and searching for a hint. Stupid me! > >> So if UINV is set in the memory image and you XRSTORS five times in a >> row, the first one will work assuming UINV was zero. The second one >> will #GP. > > Yes. I can see what you mean now :) > >> 11.8.2.3 XSAVES >> After saving the user-interrupt state component, XSAVES clears UINV. (UINV is IA32_UINTR_MISC[39:32]; >> XSAVES does not modify the remainder of that MSR.) >> >> So if we're running a UPID-enabled user task and we switch to a kernel >> thread, we do XSAVES and UINV is cleared. Then we switch back to the >> same task and don't do XRSTORS (or otherwise write IA32_UINTR_MISC) >> and UINV is still clear. > > Yes, that has to be mopped up on the way to user space. > >> And we had better clear UINV when running a kernel thread because the >> UPID might get freed or the kernel thread might do some CPL3 >> shenanigans (via EFI, perhaps? I don't know if any firmwares actually >> do this). > > Right. That's what happens already with the current pile. > >> So all this seems to put UINV into the "independent" category of >> feature along with LBR. And the 512-byte wastes from extra copies of >> the legacy area and the loss of the XMODIFIED optimization will just >> be collateral damage. > > So we'd end up with two XSAVES on context switch. We can simply do: > > XSAVES(); > fpu.state.xtsate.uintr.uinv = 0; Could work. As long as UINV is armed, RR can change at any time (maybe just when IF=1? The manual is unclear). But the first XSAVES disarms UINV, so maybe this won’t confuse any callers. > > which allows to do as many XRSTORS in a row as we want. Only the final > one on the way to user space will have to restore the real vector if the > register state is not valid: > > if (fpu_state_valid()) { > if (needs_uinv(current) > wrmsrl(UINV, vector); > } else { > if (needs_uinv(current) > fpu.state.xtsate.uintr.uinv = vector; > XRSTORS(); > } > > Hmm? I like it better than anything else I’ve seen. > > Thanks, > > tglx