From: Andy Lutomirski [mailto:luto@xxxxxxxxxx] > On Tue, Jan 15, 2019 at 12:54 PM Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > > > > On 1/15/19 12:26 PM, Andy Lutomirski wrote: > > > I don't think we'd ever want kernel_fpu_end() to restore anything, > > > right? I'm a bit confused as to when this optimization would actually > > > be useful. > > > > Using AVX-512 as an example... > > > > Let's say there was AVX-512 state, and a kernel_fpu_begin() user only > > used AVX2. We could totally avoid doing *any* AVX-512 state save/restore. > > > > The init optimization doesn't help us if there _is_ AVX-512 state, and > > the modified optimization only helps if we recently did a XRSTOR at > > context switch and have not written to AVX-512 state since XRSTOR. > > > > This probably only matters for AVX-512-using apps that have run on a > > kernel with lots of kernel_fpu_begin()s that don't use AVX-512. So, not > > a big deal right now. > > On top of this series, this gets rather awkward, I think -- now we > need to be able to keep track of a state in which some of the user > registers live in the CPU and some live in memory, and we need to be > able to do the partial restore if we go back to user mode like this. > We also need to be able to do a partial save if we end up context > switching. This seems rather complicated. If kernel_fpu_begin() requests registers that are 'live' for userspace, or if the user registers have been saved then you (more or less) have to disable pre-emption. OTOH if the kernel wants the AVX2 registers and the user ones are all 0 then the kernel can just use the registers provided kernel_fpu_end() zeroes them. In this can you can allow pre-emption because it will save everything and it will all get restored correctly (will need to be restored when the process is scheduled, not return to user). The register save area might need zapping (if used) because it might be readable from user space (by a debugger). The other case is kernel code that guarantees to save and restore any registers is uses (it might only want 2 registers for a CRC). Such code can nest with other kernel users (eg in an ISR). I'm not sure whether is needs a small 'save area' for fpu flags? It might be worth adding such a structure to the interface - even if it is currently a dummy structure. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)