On Mon, Oct 29, 2018 at 07:33:43PM -0700, Isaku Yamahata wrote: > On Mon, Oct 29, 2018 at 06:54:50PM +0000, > Roman Kagan <rkagan@xxxxxxxxxxxxx> wrote: > > On Wed, Oct 24, 2018 at 09:48:26PM -0700, Isaku Yamahata wrote: > > > +/* ibytes = fixed header size + var header size + data size in bytes */ > > > +static inline u64 hv_do_xmm_fast_hypercall( > > > + u32 varhead_code, void *input, size_t ibytes, > > > + void *output, size_t obytes) > > > +{ > > > + u64 control = (u64)varhead_code | HV_HYPERCALL_FAST_BIT; > > > + u64 hv_status; > > > + u64 input1; > > > + u64 input2; > > > + size_t i_end = roundup(ibytes, 16); > > > + size_t o_end = i_end + roundup(obytes, 16); > > > + u64 *ixmm = (u64 *)input + 2; > > > + u64 tmp[(o_end - 16) / 8] __aligned((16)); > > > + > > > + BUG_ON(i_end <= 16); > > > + BUG_ON(o_end > HV_XMM_BYTE_MAX); > > > + BUG_ON(!IS_ALIGNED((unsigned long)input, 16)); > > > + BUG_ON(!IS_ALIGNED((unsigned long)output, 16)); > > > + > > > + /* it's assumed that there are at least two inputs */ > > > + input1 = ((u64 *)input)[0]; > > > + input2 = ((u64 *)input)[1]; > > > + > > > + preempt_disable(); > > > > Don't you rather need kernel_fpu_begin() here (paired with > > kernel_fpu_end() at the end)? This may affect your benchmark results > > noticably. > > You're right. For that reason, it's intentional to NOT use > kernel_fpu_begin/end() for that reason. I'll add a comment on it. How do you make sure you don't clobber task's fpu state then? > > > - res = hv_do_hypercall(HVCALL_RETARGET_INTERRUPT | (var_size << 17), > > > - params, NULL); > > > + res = hv_do_hypercall( > > > + HVCALL_RETARGET_INTERRUPT | (var_size << 17), > > > + params, sizeof(*params) + var_size * 8, NULL, 0); > > > > This probably isn't performance-critical and can be left as is. > > (Frankly I'm struggling to understand why this has to be a hypercall at > > all.) > > If interrupt is pending, this hpyercall gives VMM a chance to inject > interrupt into VM. Why wouldn't a regular VMBus message give VMM that chance? Anyway interrupt rebalancing is not an operation you do frequently so I don't see why bother optimizing it. Roman.