On 12/09/2017 18:48, Peter Feiner wrote: >>> >>> Because update_permission_bitmask is actually the top item in the profile >>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand >>> clock cycles, or up to 30%: > > This is a great improvement! Why not take it a step further and > compute the whole table once at module init time and be done with it? > There are only 5 extra input bits (nx, ept, smep, smap, wp), 4 actually, nx could be ignored (because unlike WP, the bit is reserved when nx is disabled). It is only handled for clarity. > so the > whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if > you had 32 VMs on the host, you'd actually save memory! Indeed; my thought was to write a script or something to generate the tables at compile time, but doing it at module init time would be clever and easier. That said, the generated code for the function, right now, is pretty good. If it saved 1000 clock cycles per nested vmexit it would be very convincing, but if it were 50 or even 100 a bit less so. Paolo