> On Aug 26, 2018, at 7:18 AM, Jason A. Donenfeld <Jason@xxxxxxxxx> wrote: > > On Sun, Aug 26, 2018 at 8:06 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: >>> Do you mean to say you intend to make kernel_fpu_end() and >>> kernel_neon_end() only actually do something upon context switch, but >>> not when it's actually called? So that multiple calls to >>> kernel_fpu_begin() and kernel_neon_begin() can be made without >>> penalty? >> >> On context switch and exit to user. That allows to keep those code pathes >> fully preemptible. Still twisting my brain around the details. > > Just to make sure we're on the same page, the goal is so that this code: > > kernel_fpu_begin(); > kernel_fpu_end(); > kernel_fpu_begin(); > kernel_fpu_end(); > kernel_fpu_begin(); > kernel_fpu_end(); > kernel_fpu_begin(); > kernel_fpu_end(); > kernel_fpu_begin(); > kernel_fpu_end(); > kernel_fpu_begin(); > kernel_fpu_end(); > ... > > has the same performance as this code: > > kernel_fpu_begin(); > kernel_fpu_end(); > > (Unless of course the process is preempted or the like.) > > Currently the present situation makes the performance of the above > wildly different, since kernel_fpu_end() does something immediately. > > What about something like this: > - Add a tristate flag connected to task_struct (or in the global fpu > struct in the case that this happens in irq and there isn't a valid > current). > - On kernel_fpu_begin(), if the flag is 0, do the usual expensive > XSAVE stuff, and set the flag to 1. > - On kernel_fpu_begin(), if the flag is non-0, just set the flag to 1 > and return. > - On kernel_fpu_end(), if the flag is non-0, set the flag to 2. > (Otherwise WARN() or BUG() or something.) > - On context switch / preemption / etc away from the task, if the flag > is non-0, XRSTOR and such. It’s not that simple. First, these states need names, at least for thinking about. 0 is “user state in regs”. 1 is “kernel state active”. 2 is “nothing active”. The actual encoding will be something like TIF_XSTATE_UNLOADED: user state is not in regs. TIF_KERNEL_XSTATE: kernel is using FPU. And this fundamentally doubles the size of struct fpu. Tglx, that doubling-the-size-of-fpu makes me question the idea of letting the kernel use the fpu while preemptible. > - On context switch / preemption / etc back to the task, if the flag > is 1, XSAVE and such. If the flag is 2, set it to 0. > > Jason