On Wed, Nov 13, 2024, Sean Christopherson wrote: > On Fri, Nov 08, 2024, Paolo Bonzini wrote: > > static void kvm_gmem_mark_prepared(struct file *file, pgoff_t index, struct folio *folio) > > { > > - folio_mark_uptodate(folio); > > + struct kvm_gmem_inode *i_gmem = (struct kvm_gmem_inode *)file->f_inode->i_private; > > + unsigned long *p = i_gmem->prepared + BIT_WORD(index); > > + unsigned long npages = folio_nr_pages(folio); > > + > > + /* Folios must be naturally aligned */ > > + WARN_ON_ONCE(index & (npages - 1)); > > + index &= ~(npages - 1); > > + > > + /* Clear page before updating bitmap. */ > > + smp_wmb(); > > + > > + if (npages < BITS_PER_LONG) { > > + bitmap_set_atomic_word(p, index, npages); > > + } else { > > + BUILD_BUG_ON(BITS_PER_LONG != 64); > > + memset64((u64 *)p, ~0, BITS_TO_LONGS(npages)); > > More code that could be deduplicated (unprepared path below). > > But more importantly, I'm pretty sure the entire lockless approach is unsafe. > > Callers of kvm_gmem_get_pfn() do not take any locks that protect kvm_gmem_mark_prepared() > from racing with kvm_gmem_mark_range_unprepared(). kvm_mmu_invalidate_begin() > prevents KVM from installing a stage-2 mapping, i.e. from consuming the PFN, but > it doesn't prevent kvm_gmem_mark_prepared() from being called. And > sev_handle_rmp_fault() never checks mmu_notifiers, which is probably fine? But > sketchy. > > Oof. Side topic. sev_handle_rmp_fault() is suspect for other reasons. It does > its own lookup of the PFN, and so could see an entirely different PFN than was > resolved by kvm_mmu_page_fault(). And thanks to KVM's wonderful 0/1/-errno > behavior, sev_handle_rmp_fault() is invoked even when KVM wants to retry the > fault, e.g. due to an active MMU invalidation. > > Anyways, KVM wouldn't _immediately_ consume bad data, as the invalidation > would block the current page fault. But clobbering i_gmem->prepared would result > in a missed "prepare" phase if the hole-punch region were restored. > > One idea would be to use a rwlock to protect updates to the bitmap (setters can > generally stomp on each other). And to avoid contention whenever possible in > page fault paths, only take the lock if the page is not up-to-date, because again > kvm_mmu_invalidate_{begin,end}() protects against UAF, it's purely updates to the > bitmap that need extra protection. Actually, there's no point in having a rwlock, because readers are serialized on the folio's lock. And KVM absolutely relies on that already, because otherwise multiple vCPUs could see an unprepared folio, and clear_highpage() could end up racing with writes from the vCPU. > Note, using is mmu_invalidate_retry_gfn() is unsafe, because it must be called > under mmu_lock to ensure it doesn't get false negatives.