> From: Tian, Kevin > Sent: Thursday, December 16, 2021 9:00 AM > > > > Off-list, Thomas mentioned doing it even at vCPU creation as long as the > > prctl has been called. That is also okay and even simpler. > > Make sense. It also avoids the #GP thing in the emulation path if just due > to reallocation error. > > We'll follow this direction to do a quick update/test. > After some study there are three opens which we'd like to sync here. Once they are closed we'll send out a new version very soon (hopefully tomorrow). 1) Have a full expanded buffer at vCPU creation There are two options. One is to directly allocate a big-enough buffer upon guest_fpu::perm in fpu_alloc_guest_fpstate(). There is no reallocation per-se thus most changes in this series are not required. The other is to keep the reallocation concept (thus all previous patches are kept) and still call a wrapper around __xfd_enable_feature() even at vCPU creation (e.g. right after fpu_init_guest_permissions()). This matches the fpu core assumption that fpstate for xfd features are dynamically allocated, though the actual calling point may not be dynamic. This also allows us to exploit doing expansion in KVM_SET_CPUID2 (see next). 2) Do expansion at vCPU creation or KVM_ SET_CPUID2? If the reallocation concept is still kept, then we feel doing expansion in KVM_SET_CPUID2 makes slightly more sense. There is no functional difference between two options since the guest is not running at this point. And in general Qemu should set prctl according to the cpuid bits. But since anyway we still need to check guest cpuid against guest perm in KVM_SET_CPUID2, it reads clearer to expand the buffer only after this check is passed. If this approach is agreed, then we may replace the helper functions in this patch with a new one: /* * fpu_update_guest_perm_features - Enable xfeatures according to guest perm * @guest_fpu: Pointer to the guest FPU container * * Enable all dynamic xfeatures according to guest perm. Invoked if the * caller wants to conservatively expand fpstate buffer instead of waiting * until a given feature is accessed. * * Return: 0 on success, error code otherwise */ +int fpu_update_guest_perm_features(struct fpu_guest *guest_fpu) +{ + u64 expand; + + lockdep_assert_preemption_enabled(); + + if (!IS_ENABLED(CONFIG_X86_64)) + return 0; + + expand = guest_fpu->perm & ~guest_fpu->xfeatures; + if (!expand) + return 0; + + return __xfd_enable_feature(expand, guest_fpu); +} +EXPORT_SYMBOL_GPL(fpu_update_guest_features); 3) Always disable interception of disable after 1st interception? Once we choose to have a full expanded buffer before guest runs, the point of intercepting WRMSR(IA32_XFD) becomes less obvious since no dynamic reallocation is required. One option is to always disable WRMSR interception once KVM_SET_CPUID2 succeeds, with the cost of one RDMSR per vm-exit. But doing so affects legacy OS which even has no XFD logic at all. The other option is to continue the current policy i.e. disable write emulation only after the 1st interception of setting XFD to a non-zero value. Then the RDMSR cost is added only for guest which supports XFD. In either case we need another helper to update guest_fpu->fpstate->xfd as required in the restore path. For the 2nd option we further want to update MSR if guest_fpu is currently in use: +void xfd_update_guest_xfd(struct fpu_guest *guest_fpu, u64 xfd) +{ + fpregs_lock(); + guest_fpu->fpstae->xfd = xfd; + if (guest_fpu->fpstate->in_use) + xfd_update_state(guest_fpu->fpstate); + fpregs_unlock(); +} Thoughts? -- p.s. currently we're working on a quick prototype based on: - Expand buffer in KVM_SET_CPUID2 - Disable write emulation after first interception to check any oversight. Thanks Kevin