On Fri, Aug 14, 2020 at 3:09 AM Chenyi Qiang <chenyi.qiang@xxxxxxxxx> wrote: > > > > On 8/14/2020 1:52 AM, Jim Mattson wrote: > > On Wed, Aug 12, 2020 at 9:54 PM Chenyi Qiang <chenyi.qiang@xxxxxxxxx> wrote: > >> > >> > >> > >> On 8/11/2020 8:05 AM, Jim Mattson wrote: > >>> On Fri, Aug 7, 2020 at 1:47 AM Chenyi Qiang <chenyi.qiang@xxxxxxxxx> wrote: > >>>> > >>>> PKS MSR passes through guest directly. Configure the MSR to match the > >>>> L0/L1 settings so that nested VM runs PKS properly. > >>>> > >>>> Signed-off-by: Chenyi Qiang <chenyi.qiang@xxxxxxxxx> > >>>> --- > > > >>>> + (!vmx->nested.nested_run_pending || > >>>> + !(vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PKRS))) > >>>> + vmcs_write64(GUEST_IA32_PKRS, vmx->nested.vmcs01_guest_pkrs); > >>> > >>> This doesn't seem right to me. On the target of a live migration, with > >>> L2 active at the time the snapshot was taken (i.e., > >>> vmx->nested.nested_run_pending=0), it looks like we're going to try to > >>> overwrite the current L2 PKRS value with L1's PKRS value (except that > >>> in this situation, vmx->nested.vmcs01_guest_pkrs should actually be > >>> 0). Am I missing something? > >>> > >> > >> We overwrite the L2 PKRS with L1's value when L2 doesn't support PKS. > >> Because the L1's VM_ENTRY_LOAD_IA32_PKRS is off, we need to migrate L1's > >> PKRS to L2. > > > > I'm thinking of the case where vmx->nested.nested_run_pending is > > false, and we are processing a KVM_SET_NESTED_STATE ioctl, yet > > VM_ENTRY_LOAD_IA32_PKRS *is* set in the vmcs12. > > > > Oh, I miss this case. What I'm still confused here is that the > restoration for GUEST_IA32_DEBUGCTL and GUEST_BNDCFGS have the same > issue, right? or I miss something. I take it back. This does work, assuming that userspace calls KVM_SET_MSRS before calling KVM_SET_NESTED_STATE. Assuming L2 is active when the checkpoint is taken, the MSR values saved will be the L2 values. When restoring the MSRs with KVM_SET_MSRS, the L2 MSR values will be written into vmcs01. They don't belong there, but we're never going to launch vmcs01 with those MSR values. Instead, when userspace calls KVM_SET_NESTED_STATE, those values will be transferred first to the vmcs01_<msr> fields of the vmx->nested struct, and then to vmcs02. This is subtle, and I don't think it's documented anywhere that KVM_SET_NESTED_STATE must be called after KVM_SET_MSRS. In fact, there are probably a number of dependencies among the various KVM_SET_* functions that aren't documented anywhere.