On Sun, Aug 26, 2018 at 8:06 AM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > Do you mean to say you intend to make kernel_fpu_end() and > > kernel_neon_end() only actually do something upon context switch, but > > not when it's actually called? So that multiple calls to > > kernel_fpu_begin() and kernel_neon_begin() can be made without > > penalty? > > On context switch and exit to user. That allows to keep those code pathes > fully preemptible. Still twisting my brain around the details. Just to make sure we're on the same page, the goal is so that this code: kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); kernel_fpu_begin(); kernel_fpu_end(); ... has the same performance as this code: kernel_fpu_begin(); kernel_fpu_end(); (Unless of course the process is preempted or the like.) Currently the present situation makes the performance of the above wildly different, since kernel_fpu_end() does something immediately. What about something like this: - Add a tristate flag connected to task_struct (or in the global fpu struct in the case that this happens in irq and there isn't a valid current). - On kernel_fpu_begin(), if the flag is 0, do the usual expensive XSAVE stuff, and set the flag to 1. - On kernel_fpu_begin(), if the flag is non-0, just set the flag to 1 and return. - On kernel_fpu_end(), if the flag is non-0, set the flag to 2. (Otherwise WARN() or BUG() or something.) - On context switch / preemption / etc away from the task, if the flag is non-0, XRSTOR and such. - On context switch / preemption / etc back to the task, if the flag is 1, XSAVE and such. If the flag is 2, set it to 0. Jason