On Thu, Oct 03, 2024, Ackerley Tng wrote: > Ackerley Tng <ackerleytng@xxxxxxxxxx> writes: > > > Elliot Berman <quic_eberman@xxxxxxxxxxx> writes: > >> From x86 CoCo perspective, I think it also makes sense to not zero > >> the folio when changing faultiblity from private to shared: > >> - If guest is sharing some data with host, you've wiped the data and > >> guest has to copy again. > >> - Or, if SEV/TDX enforces that page is zero'd between transitions, > >> Linux has duplicated the work that trusted entity has already done. > >> > >> Fuad and I can help add some details for the conversion. Hopefully we > >> can figure out some of the plan at plumbers this week. > > > > Zeroing the page prevents leaking host data (see function docstring for > > kvm_gmem_prepare_folio() introduced in [1]), so we definitely don't want > > to introduce a kernel data leak bug here. > > Actually it seems like filemap_grab_folio() already gets a zeroed page. > > filemap_grab_folio() eventually calls __alloc_pages_noprof() > -> get_page_from_freelist() > -> prep_new_page() > -> post_alloc_hook() > > and post_alloc_hook() calls kernel_init_pages(), which zeroes the page, > depending on kernel config. > > Paolo, was calling clear_highpage() in kvm_gmem_prepare_folio() zeroing an > already empty page returned from filemap_grab_folio()? Yes and no. CONFIG_INIT_ON_ALLOC_DEFAULT_ON and init_on_alloc are very much hardening features, not functional behavior that other code _needs_ to be aware of. E.g. enabling init-on-alloc comes with a measurable performance cost. Ignoring hardening, the guest_memfd mapping specifically sets the gfp_mask to GFP_HIGHUSER, i.e. doesn't set __GFP_ZERO. That said, I wouldn't be opposed to skipping the clear_highpage() call when want_init_on_alloc() is true. Also, the intended behavior (or at least, what intended) of kvm_gmem_prepare_folio() was it would do clear_highpage() if and only if a trusted entity does NOT zero the page. Factoring that in is a bit harder, as it probably requires another arch hook (or providing an out-param from kvm_arch_gmem_prepare()). I.e. the want_init_on_alloc() case isn't the only time KVM could shave cycles by not redundantly zeroing memory.