On Tue, Nov 12, 2019 at 11:19:44AM +0100, Paolo Bonzini wrote: > On 12/11/19 01:51, Dan Williams wrote: > > An elevated page reference count for file mapped pages causes the > > filesystem (for a dax mode file) to wait for that reference count to > > drop to 1 before allowing the truncate to proceed. For a page cache > > backed file mapping (non-dax) the reference count is not considered in > > the truncate path. It does prevent the page from getting freed in the > > page cache case, but the association to the file is lost for truncate. > > KVM support for file-backed guest memory is limited. It is not > completely broken, in fact cases such as hugetlbfs are in use routinely, > but corner cases such as truncate aren't covered well indeed. KVM's actual MMU should be ok since it coordinates with the mmu_notifier. kvm_vcpu_map() is where KVM could run afoul of page cache truncation. This is the other main use of hva_to_pfn*(), where KVM directly accesses guest memory (which could be file-backed) without coordinating with the mmu_notifier. IIUC, an ill-timed page cache truncation could result in a write from KVM effectively being dropped due to writeback racing with KVM's write to the page. If that's true, then I think KVM would need to to move to the proposed pin_user_pages() to ensure its "DMA" isn't lost. > > As long as any memory the guest expects to be persistent is backed by > > mmu-notifier coordination we're all good, otherwise an elevated > > reference count does not coordinate with truncate in a reliable way. KVM itself is (mostly) blissfully unaware of any such expectations. The userspace VMM, e.g. Qemu, is ultimately responsible for ensuring the guest sees a valid model, e.g. that persistent memory (as presented to the guest) is actually persistent (from the guest's perspective). The big caveat is the truncation issue above.