On Thu, Apr 20, 2023 at 8:47 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Wed, Apr 19, 2023, Atish Patra wrote: > > +int kvm_riscv_cove_vm_measure_pages(struct kvm *kvm, struct kvm_riscv_cove_measure_region *mr) > > +{ > > + struct kvm_cove_tvm_context *tvmc = kvm->arch.tvmc; > > + int rc = 0, idx, num_pages; > > + struct kvm_riscv_cove_mem_region *conf; > > + struct page *pinned_page, *conf_page; > > + struct kvm_riscv_cove_page *cpage; > > + > > + if (!tvmc) > > + return -EFAULT; > > + > > + if (tvmc->finalized_done) { > > + kvm_err("measured_mr pages can not be added after finalize\n"); > > + return -EINVAL; > > + } > > + > > + num_pages = bytes_to_pages(mr->size); > > + conf = &tvmc->confidential_region; > > + > > + if (!IS_ALIGNED(mr->userspace_addr, PAGE_SIZE) || > > + !IS_ALIGNED(mr->gpa, PAGE_SIZE) || !mr->size || > > + !cove_is_within_region(conf->gpa, conf->npages << PAGE_SHIFT, mr->gpa, mr->size)) > > + return -EINVAL; > > + > > + idx = srcu_read_lock(&kvm->srcu); > > + > > + /*TODO: Iterate one page at a time as pinning multiple pages fail with unmapped panic > > + * with a virtual address range belonging to vmalloc region for some reason. > > I've no idea what code you had, but I suspect the fact that vmalloc'd memory isn't > guaranteed to be physically contiguous is relevant to the panic. > Ahh. That makes sense. Thanks. > > + */ > > + while (num_pages) { > > + if (signal_pending(current)) { > > + rc = -ERESTARTSYS; > > + break; > > + } > > + > > + if (need_resched()) > > + cond_resched(); > > + > > + rc = get_user_pages_fast(mr->userspace_addr, 1, 0, &pinned_page); > > + if (rc < 0) { > > + kvm_err("Pinning the userpsace addr %lx failed\n", mr->userspace_addr); > > + break; > > + } > > + > > + /* Enough pages are not available to be pinned */ > > + if (rc != 1) { > > + rc = -ENOMEM; > > + break; > > + } > > + conf_page = alloc_page(GFP_KERNEL | __GFP_ZERO); > > + if (!conf_page) { > > + rc = -ENOMEM; > > + break; > > + } > > + > > + rc = cove_convert_pages(page_to_phys(conf_page), 1, true); > > + if (rc) > > + break; > > + > > + /*TODO: Support other pages sizes */ > > + rc = sbi_covh_add_measured_pages(tvmc->tvm_guest_id, page_to_phys(pinned_page), > > + page_to_phys(conf_page), SBI_COVE_PAGE_4K, > > + 1, mr->gpa); > > + if (rc) > > + break; > > + > > + /* Unpin the page now */ > > + put_page(pinned_page); > > + > > + cpage = kmalloc(sizeof(*cpage), GFP_KERNEL_ACCOUNT); > > + if (!cpage) { > > + rc = -ENOMEM; > > + break; > > + } > > + > > + cpage->page = conf_page; > > + cpage->npages = 1; > > + cpage->gpa = mr->gpa; > > + cpage->hva = mr->userspace_addr; > > Snapshotting the userspace address for the _source_ page can't possibly be useful. > Yeah. Currently, the hva in the kvm_riscv_cove_page is not used anywhere in the code. We can remove it. > > + cpage->is_mapped = true; > > + INIT_LIST_HEAD(&cpage->link); > > + list_add(&cpage->link, &tvmc->measured_pages); > > + > > + mr->userspace_addr += PAGE_SIZE; > > + mr->gpa += PAGE_SIZE; > > + num_pages--; > > + conf_page = NULL; > > + > > + continue; > > + } > > + srcu_read_unlock(&kvm->srcu, idx); > > + > > + if (rc < 0) { > > + /* We don't to need unpin pages as it is allocated by the hypervisor itself */ > > This comment makes no sense. The above code is doing all of the allocation and > pinning, which strongly suggests that KVM is the hypervisor. But this comment > implies that KVM is not the hypervisor. > I mean to say here the conf_page is allocated in the kernel using alloc_page. So no pinning/unpinning is required. It seems the comment is probably misleading & confusing at best. I will remove it. > And "pinned_page" is cleared unpinned in the loop after the page is added+measured, > which looks to be the same model as TDX where "pinned_page" is the source and > "conf_page" is gifted to the hypervisor. But on failure, e.g. when allocating > "conf_page", that reference is not put. > Thanks. Will fix it. > > + cove_delete_page_list(kvm, &tvmc->measured_pages, false); > > + /* Free the last allocated page for which conversion/measurement failed */ > > + kfree(conf_page); > > Assuming my guesses about how the architecture works are correct, this is broken > if sbi_covh_add_measured_pages() fails. The page has already been gifted to the Yeah. The last conf_page should be reclaimed as well if measured_pages fail at any point in the loop. All other allocated ones would be reclaimed as a part of cove_delete_page_list. > TSM by cove_convert_pages(), but there is no call to sbi_covh_tsm_reclaim_pages(), > which I'm guessing is necesary to transition the page back to a state where it can > be safely used by the host.