On 10/13/21 10:24 AM, Sean Christopherson wrote: > On Wed, Oct 13, 2021, Brijesh Singh wrote: >>> The more I look at this, the more strongly I feel that private <=> shared conversions >>> belong in the MMU, and that KVM's SPTEs should be the single source of truth for >>> shared vs. private. E.g. add a SPTE_TDP_PRIVATE_MASK in the software available bits. >>> I believe the only hiccup is the snafu where not zapping _all_ SPTEs on memslot >>> deletion breaks QEMU+VFIO+GPU, i.e. KVM would lose its canonical info on unrelated >>> memslot deletion. >>> >>> But that is a solvable problem. Ideally the bug, wherever it is, would be root >>> caused and fixed. I believe Peter (and Marc?) is going to work on reproducing >>> the bug. >> We have been also setting up VM with Qemu + VFIO + GPU usecase to repro >> the bug on AMD HW and so far we no luck in reproducing it. Will continue >> stressing the system to recreate it. Lets hope that Peter (and Marc) can >> easily recreate on Intel HW so that we can work towards fixing it. > Are you trying on a modern kernel? If so, double check that nx_huge_pages is off, > turning that on caused the bug to disappear. It should be off for AMD systems, > but it's worth checking. Yes, this is a recent kernel. I will double check the nx_huge_pages is off. >>>> + if (!rc) { >>>> + /* >>>> + * This may happen if another vCPU unmapped the page >>>> + * before we acquire the lock. Retry the PSC. >>>> + */ >>>> + write_unlock(&kvm->mmu_lock); >>>> + return 0; >>> How will the caller (guest?) know to retry the PSC if KVM returns "success"? >> If a guest is adhering to the GHCB spec then it will see that hypervisor >> has not processed all the entry and it should retry the PSC. > But AFAICT that information isn't passed to the guest. Even in this single-page > MSR-based case, the caller will say "all good" on a return of 0. > > The "full" path is more obvious, as the caller clearly continues to process > entries unless there's an actual failure. > > + for (; cur <= end; cur++) { > + entry = &info->entries[cur]; > + gpa = gfn_to_gpa(entry->gfn); > + level = RMP_TO_X86_PG_LEVEL(entry->pagesize); > + op = entry->operation; > + > + if (!IS_ALIGNED(gpa, page_level_size(level))) { > + rc = PSC_INVALID_ENTRY; > + goto out; > + } > + > + rc = __snp_handle_page_state_change(vcpu, op, gpa, level); > + if (rc) > + goto out; > + } > Please see the guest kernel patch #19 [1]. In spec there is no special code for the retry. The guest will look at the PSC hdr to determine how many entries were processed by the hypervisor (in this particular case a 0). And at time the guest can do whatever it wants. In the case of Linux guest, we retry the PSC. [1] https://lore.kernel.org/linux-mm/20211008180453.462291-20-brijesh.singh@xxxxxxx/ thanks