On 2023-08-15 at 10:18:53 -0700, isaku.yamahata@xxxxxxxxx wrote: > From: Michael Roth <michael.roth@xxxxxxx> > > TODO: add a CONFIG option that can be to completely skip arch > invalidation loop and avoid __weak references for arch/platforms that > don't need an additional invalidation hook. > > In some cases, like with SEV-SNP, guest memory needs to be updated in a > platform-specific manner before it can be safely freed back to the host. > Add hooks to wire up handling of this sort when freeing memory in > response to FALLOC_FL_PUNCH_HOLE operations. > > Also issue invalidations of all allocated pages when releasing the gmem > file so that the pages are not left in an unusable state when they get > freed back to the host. > > Signed-off-by: Michael Roth <michael.roth@xxxxxxx> > Link: https://lore.kernel.org/r/20230612042559.375660-3-michael.roth@xxxxxxx > [...] > +/* Handle arch-specific hooks needed before releasing guarded pages. */ > +static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct file *file, > + pgoff_t start, pgoff_t end) > +{ > + pgoff_t file_end = i_size_read(file_inode(file)) >> PAGE_SHIFT; > + pgoff_t index = start; > + > + end = min(end, file_end); > + > + while (index < end) { > + struct folio *folio; > + unsigned int order; > + struct page *page; > + kvm_pfn_t pfn; > + > + folio = __filemap_get_folio(file->f_mapping, index, > + FGP_LOCK, 0); > + if (!folio) { > + index++; > + continue; > + } > + > + page = folio_file_page(folio, index); > + pfn = page_to_pfn(page); > + order = folio_order(folio); > + > + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index)); Observed an issue there. The valid page may not point to the first page in the folio, then the range [pfn, pfn + (1ul << order)) expands to the next folio. This makes a part of the pages be invalidated again when loop to the next folio. On TDX, it causes TDH_PHYMEM_PAGE_WBINVD failed. > + > + index = folio_next_index(folio); > + folio_unlock(folio); > + folio_put(folio); > + > + cond_resched(); > + } > +} My fix would be: diff --git a/virt/kvm/guest_mem.c b/virt/kvm/guest_mem.c index e629782d73d5..3665003c3746 100644 --- a/virt/kvm/guest_mem.c +++ b/virt/kvm/guest_mem.c @@ -155,7 +155,7 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode, while (index < end) { struct folio *folio; - unsigned int order; + pgoff_t ntails; struct page *page; kvm_pfn_t pfn; @@ -168,9 +168,9 @@ static void kvm_gmem_issue_arch_invalidate(struct kvm *kvm, struct inode *inode, page = folio_file_page(folio, index); pfn = page_to_pfn(page); - order = folio_order(folio); + ntails = folio_nr_pages(folio) - folio_page_idx(folio, page); - kvm_arch_gmem_invalidate(kvm, pfn, pfn + min((1ul << order), end - index)); + kvm_arch_gmem_invalidate(kvm, pfn, pfn + min(ntails, end - index)); index = folio_next_index(folio); folio_unlock(folio); Thanks, Yilun