Hi Ackerley, On Thu, 2024-10-03 at 22:32 +0100, Ackerley Tng wrote: > Elliot Berman <quic_eberman@xxxxxxxxxxx> writes: > >> On Tue, Sep 10, 2024 at 11:44:01PM +0000, Ackerley Tng wrote: >>> Since guest_memfd now supports mmap(), folios have to be prepared >>> before they are faulted into userspace. >>> >>> When memory attributes are switched between shared and private, the >>> up-to-date flags will be cleared. >>> >>> Use the folio's up-to-date flag to indicate being ready for the guest >>> usage and can be used to mark whether the folio is ready for shared OR >>> private use. >> >> Clearing the up-to-date flag also means that the page gets zero'd out >> whenever it transitions between shared and private (either direction). >> pKVM (Android) hypervisor policy can allow in-place conversion between >> shared/private. >> >> I believe the important thing is that sev_gmem_prepare() needs to be >> called prior to giving page to guest. In my series, I had made a >> ->prepare_inaccessible() callback where KVM would only do this part. >> When transitioning to inaccessible, only that callback would be made, >> besides the bookkeeping. The folio zeroing happens once when allocating >> the folio if the folio is initially accessible (faultable). >> >> From x86 CoCo perspective, I think it also makes sense to not zero >> the folio when changing faultiblity from private to shared: >> - If guest is sharing some data with host, you've wiped the data and >> guest has to copy again. >> - Or, if SEV/TDX enforces that page is zero'd between transitions, >> Linux has duplicated the work that trusted entity has already done. >> >> Fuad and I can help add some details for the conversion. Hopefully we >> can figure out some of the plan at plumbers this week. > > Zeroing the page prevents leaking host data (see function docstring for > kvm_gmem_prepare_folio() introduced in [1]), so we definitely don't want > to introduce a kernel data leak bug here. > > In-place conversion does require preservation of data, so for > conversions, shall we zero depending on VM type? > > + Gunyah: don't zero since ->prepare_inaccessible() is a no-op > + pKVM: don't zero > + TDX: don't zero > + SEV: AMD Architecture Programmers Manual 7.10.6 says there is no > automatic encryption and implies no zeroing, hence perform zeroing > + KVM_X86_SW_PROTECTED_VM: Doesn't have a formal definition so I guess > we could require zeroing on transition? Maybe for KVM_X86_SW_PROTECTED_VM we could make zero-ing configurable via some CREATE_GUEST_MEMFD flag, instead of forcing one specific behavior. For the "non-CoCo with direct map entries removed" VMs that we at AWS are going for, we'd like a VM type with host-controlled in-place conversions which doesn't zero on transitions, so if KVM_X86_SW_PROTECTED_VM ends up zeroing, we'd need to add another new VM type for that. Somewhat related sidenote: For VMs that allow inplace conversions and do not zero, we do not need to zap the stage-2 mappings on memory attribute changes, right? > This way, the uptodate flag means that it has been prepared (as in > sev_gmem_prepare()), and zeroed if required by VM type. > > Regarding flushing the dcache/tlb in your other question [2], if we > don't use folio_zero_user(), can we relying on unmapping within core-mm > to flush after shared use, and unmapping within KVM To flush after > private use? > > Or should flush_dcache_folio() be explicitly called on kvm_gmem_fault()? > > clear_highpage(), used in the non-hugetlb (original) path, doesn't flush > the dcache. Was that intended? > >> Thanks, >> Elliot >> >>> >>> <snip> > > [1] https://lore.kernel.org/all/20240726185157.72821-8-pbonzini@xxxxxxxxxx/ > [2] https://lore.kernel.org/all/diqz34ldszp3.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ Best, Patrick