On Tue, Jun 11, 2024, Paolo Bonzini wrote: > On 6/10/24 23:48, Paolo Bonzini wrote: > > On Sat, Jun 8, 2024 at 1:03 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > SNP folks and/or Paolo, what's the plan for this? I don't see how what's sitting > > > in kvm/next can possibly be correct without conditioning population on the folio > > > being !uptodate. > > > > I don't think I have time to look at it closely until Friday; but > > thanks for reminding me. > > Ok, I'm officially confused. I think I understand what you did in your > suggested code. Limiting it to the bare minimum (keeping the callback > instead of CONFIG_HAVE_KVM_GMEM_INITIALIZE) it would be something > like what I include at the end of the message. > > But the discussion upthread was about whether to do the check for > RMP state in sev.c, or do it in common code using folio_mark_uptodate(). > I am not sure what you mean by "cannot possibly be correct", and > whether it's referring to kvm_gmem_populate() in general or the > callback in sev_gmem_post_populate(). Doing fallocate() before KVM_SEV_SNP_LAUNCH_UPDATE will cause the latter to fail. That likely works for QEMU, at least for now, but it's unnecessarily restrictive and IMO incorrect/wrong. E.g. a more convoluted, fallocate() + PUNCH_HOLE + KVM_SEV_SNP_LAUNCH_UPDATE will work (I think? AFAICT adding and removing pages directly to/from the RMP doesn't affect SNP's measurement, only pages that are added via SNP_LAUNCH_UPDATE affect the measurement). Punting the sanity check to vendor code is also gross and will make it harder to provide a consistent, unified ABI for all architectures. E.g. SNP returns -EINVAL if the page is already assigned, which is quite misleading. > The change below looks like just an optimization to me, which > suggests that I'm missing something glaring. I really dislike @prepare. There are two paths that should actually initialize the contents of the folio, and they are mutually exclusive and have meaningfully different behavior. Faulting in memory via kvm_gmem_get_pfn() explicitly zeros the folio _if necessary_, whereas kvm_gmem_populate() initializes the folio with user-provided data _and_ requires that the folio be !uptodate. If we fix the above oddity where fallocate() initializes memory, then there's no need to try and handle the initialization in a common chokepoint as the two relevant paths will naturally have unique code. The below is also still suboptimal for TDX, as KVM will zero the memory and then TDX-module will also zero memory on PAGE.AUGA. And I find SNP to be the odd one. IIUC, the ASP (the artist formerly known as the PSP) doesn't provide any guarantees about the contents of a page that is assigned to a guest without bouncing through SNP_LAUNCH_UPDATE. It'd be nice to explicitly document that somewhere in the SNP code. E.g. if guest_memfd invokes a common kvm_gmem_initialize_folio() or whatever, then SNP's implementation can clearly capture that KVM zeros the page to protect the _host_ data. > diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c > index d4206e53a9c81..a0417ef5b86eb 100644 > --- a/virt/kvm/guest_memfd.c > +++ b/virt/kvm/guest_memfd.c > @@ -52,37 +52,39 @@ static int kvm_gmem_prepare_folio(struct inode *inode, pgoff_t index, struct fol > static struct folio *kvm_gmem_get_folio(struct inode *inode, pgoff_t index, bool prepare) > { > struct folio *folio; > + int r; > /* TODO: Support huge pages. */ > folio = filemap_grab_folio(inode->i_mapping, index); > if (IS_ERR(folio)) > return folio; > - /* > - * Use the up-to-date flag to track whether or not the memory has been > - * zeroed before being handed off to the guest. There is no backing > - * storage for the memory, so the folio will remain up-to-date until > - * it's removed. > - * > - * TODO: Skip clearing pages when trusted firmware will do it when > - * assigning memory to the guest. > - */ > - if (!folio_test_uptodate(folio)) { > - unsigned long nr_pages = folio_nr_pages(folio); > - unsigned long i; > + if (prepare) { > + /* > + * Use the up-to-date flag to track whether or not the memory has > + * been handed off to the guest. There is no backing storage for > + * the memory, so the folio will remain up-to-date until it's > + * removed. > + * > + * Take the occasion of the first prepare operation to clear it. > + */ > + if (!folio_test_uptodate(folio)) { > + unsigned long nr_pages = folio_nr_pages(folio); > + unsigned long i; > - for (i = 0; i < nr_pages; i++) > - clear_highpage(folio_page(folio, i)); > + for (i = 0; i < nr_pages; i++) > + clear_highpage(folio_page(folio, i)); > + } > + > + r = kvm_gmem_prepare_folio(inode, index, folio); > + if (r < 0) > + goto err_unlock_put; > folio_mark_uptodate(folio); > - } > - > - if (prepare) { > - int r = kvm_gmem_prepare_folio(inode, index, folio); > - if (r < 0) { > - folio_unlock(folio); > - folio_put(folio); > - return ERR_PTR(r); > + } else { > + if (folio_test_uptodate(folio)) { > + r = -EEXIST; > + goto err_unlock_put; > } > }