On Wed, Feb 19, 2025 at 07:09:57PM -0600, Michael Roth wrote: > On Mon, Feb 10, 2025 at 05:16:33PM -0800, Vishal Annapurve wrote: > > On Wed, Dec 11, 2024 at 10:37 PM Michael Roth <michael.roth@xxxxxxx> wrote: > > > > > > This patchset is also available at: > > > > > > https://github.com/amdese/linux/commits/snp-prepare-thp-rfc1 > > > > > > and is based on top of Paolo's kvm-coco-queue-2024-11 tag which includes > > > a snapshot of his patches[1] to provide tracking of whether or not > > > sub-pages of a huge folio need to have kvm_arch_gmem_prepare() hooks issued > > > before guest access: > > > > > > d55475f23cea KVM: gmem: track preparedness a page at a time > > > 64b46ca6cd6d KVM: gmem: limit hole-punching to ranges within the file > > > 17df70a5ea65 KVM: gmem: add a complete set of functions to query page preparedness > > > e3449f6841ef KVM: gmem: allocate private data for the gmem inode > > > > > > [1] https://lore.kernel.org/lkml/20241108155056.332412-1-pbonzini@xxxxxxxxxx/ > > > > > > This series addresses some of the pending review comments for those patches > > > (feel free to squash/rework as-needed), and implements a first real user in > > > the form of a reworked version of Sean's original 2MB THP support for gmem. > > > > > > > Looking at the work targeted by Fuad to add in-place memory conversion > > support via [1] and Ackerley in future to address hugetlb page > > support, can the state tracking for preparedness be simplified as? > > i) prepare guest memfd ranges when "first time an offset with > > mappability = GUEST is allocated or first time an allocated offset has > > mappability = GUEST". Some scenarios that would lead to guest memfd > > range preparation: > > - Create file with default mappability to host, fallocate, convert > > - Create file with default mappability to Guest, guest faults on > > private memory > > Yes, this seems like a compelling approach. One aspect that still > remains is knowing *when* the preparation has been done, so that the > next time a private page is accessed, either to re-fault into the guest > (e.g. because it was originally mapped 2MB and then a sub-page got > converted to shared so the still-private pages need to get re-faulted > in as 4K), or maybe some other path where KVM needs to grab the private > PFN via kvm_gmem_get_pfn() but not actually read/write to it (I think > the GHCB AP_CREATION path for bringing up APs might do this). > > We could just keep re-checking the RMP table to see if the PFN was > already set to private in the RMP table, but I think one of the design > goals of the preparedness tracking was to have gmem itself be aware of > this and not farm it out to platform-specific data structures/tracking. > > So as a proof of concept I've been experimenting with using Fuad's > series ([1] in your response) and adding an additional GUEST_PREPARED > state so that it can be tracked via the same mappability xarray (or > whatever data structure we end up using for mappability-tracking). > In that case GUEST becomes sort of a transient state that can be set > in advance of actual allocation/fault-time. Hi Michael, We are currently working on enabling 2M huge pages on TDX. We noticed this series and hope if could also work with TDX huge pages. While disallowing <2M page conversion is also not ideal for TDX, we also think that it would be great if we could start with 2M and non-in-place conversion first. In that case, is memory fragmentation caused by partial discarding a problem for you [1]? Is page promotion a must in your initial huge page support? Do you have any repo containing your latest POC? Thanks Yan [1] https://lore.kernel.org/all/Z9PyLE%2FLCrSr2jCM@xxxxxxxxxxxxxxxxxxxxxxxxx/