On Tue, Sep 12, 2017 at 12:55 PM, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > On 12/09/2017 18:48, Peter Feiner wrote: >>>> >>>> Because update_permission_bitmask is actually the top item in the profile >>>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand >>>> clock cycles, or up to 30%: >> >> This is a great improvement! Why not take it a step further and >> compute the whole table once at module init time and be done with it? >> There are only 5 extra input bits (nx, ept, smep, smap, wp), > > 4 actually, nx could be ignored (because unlike WP, the bit is reserved > when nx is disabled). It is only handled for clarity. > >> so the >> whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if >> you had 32 VMs on the host, you'd actually save memory! > > Indeed; my thought was to write a script or something to generate the > tables at compile time, but doing it at module init time would be clever > and easier. > > That said, the generated code for the function, right now, is pretty > good. If it saved 1000 clock cycles per nested vmexit it would be very > convincing, but if it were 50 or even 100 a bit less so. ACK. I'm good with either approach :-) Please consider this one Reviewed-By: Peter Feiner <pfeiner@xxxxxxxxxx>