On Fri, Aug 10, 2018 at 10:33:00AM +0200, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote: > I'm afraid this won't speed things up. XMM hypercalls are faster because > hypervisor doesn't need to read data from guest's memory and write it > there. Here, we actually do more. > > Another thing is kernel_fpu_begin(). In particular, in case FPU was > initialized we'll have to save its state and restore it later. This is > also expensive. I would even suggest we check > ¤t->thread.fpu->initialized and do regular hypercall in case it > is. > > Did you try to benchmark your solution? Use e.g. IPI benchmark with PV > IPIs enabled: https://lkml.org/lkml/2017/12/11/364 > (this will always have FPU uninitialized, we will also need something > else like PV TLB shootdown from a process using FPU). Hi Vitaly. I haven't benchmarked it yet. As you mentioned, the patch needs to be polished to do less. Especially not to use xsave/xrestore. Before going for such complication, I wanted to post the patch early and benchmark. I should have marked it as RFC or v1. I'm pulled out for other stuff for now, I'll resume to benchmark it and complicates it unless you (or someone else) do it before me. -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>