> From: Vishal Annapurve <vannapurve@xxxxxxxxxx> > Sent: Tuesday, September 24, 2024 4:54 AM > > On Mon, Sep 23, 2024, 10:24 AM Tian, Kevin <kevin.tian@xxxxxxxxx> wrote: > > > > > From: Vishal Annapurve <vannapurve@xxxxxxxxxx> > > > Sent: Monday, September 23, 2024 2:34 PM > > > > > > On Mon, Sep 23, 2024 at 7:36 AM Tian, Kevin <kevin.tian@xxxxxxxxx> > wrote: > > > > > > > > > From: Vishal Annapurve <vannapurve@xxxxxxxxxx> > > > > > Sent: Saturday, September 21, 2024 5:11 AM > > > > > > > > > > On Sun, Sep 15, 2024 at 11:08 PM Jason Gunthorpe <jgg@xxxxxxxxxx> > > > wrote: > > > > > > > > > > > > On Fri, Aug 23, 2024 at 11:21:26PM +1000, Alexey Kardashevskiy > wrote: > > > > > > > IOMMUFD calls get_user_pages() for every mapping which will > > > allocate > > > > > > > shared memory instead of using private memory managed by the > > > KVM > > > > > and > > > > > > > MEMFD. > > > > > > > > > > > > Please check this series, it is much more how I would expect this to > > > > > > work. Use the guest memfd directly and forget about kvm in the > > > iommufd > > > > > code: > > > > > > > > > > > > https://lore.kernel.org/r/1726319158-283074-1-git-send-email- > > > > > steven.sistare@xxxxxxxxxx > > > > > > > > > > > > I would imagine you'd detect the guest memfd when accepting the > FD > > > and > > > > > > then having some different path in the pinning logic to pin and get > > > > > > the physical ranges out. > > > > > > > > > > According to the discussion at KVM microconference around > hugepage > > > > > support for guest_memfd [1], it's imperative that guest private > memory > > > > > is not long term pinned. Ideal way to implement this integration > would > > > > > be to support a notifier that can be invoked by guest_memfd when > > > > > memory ranges get truncated so that IOMMU can unmap the > > > corresponding > > > > > ranges. Such a notifier should also get called during memory > > > > > conversion, it would be interesting to discuss how conversion flow > > > > > would work in this case. > > > > > > > > > > [1] https://lpc.events/event/18/contributions/1764/ (checkout the > > > > > slide 12 from attached presentation) > > > > > > > > > > > > > Most devices don't support I/O page fault hence can only DMA to long > > > > term pinned buffers. The notifier might be helpful for in-kernel > conversion > > > > but as a basic requirement there needs a way for IOMMUFD to call into > > > > guest memfd to request long term pinning for a given range. That is > > > > how I interpreted "different path" in Jason's comment. > > > > > > Policy that is being aimed here: > > > 1) guest_memfd will pin the pages backing guest memory for all users. > > > 2) kvm_gmem_get_pfn users will get a locked folio with elevated > > > refcount when asking for the pfn/page from guest_memfd. Users will > > > drop the refcount and release the folio lock when they are done > > > using/installing (e.g. in KVM EPT/IOMMU PT entries) it. This folio > > > lock is supposed to be held for short durations. > > > 3) Users can assume the pfn is around until they are notified by > > > guest_memfd on truncation or memory conversion. > > > > > > Step 3 above is already followed by KVM EPT setup logic for CoCo VMs. > > > TDX VMs especially need to have secure EPT entries always mapped > (once > > > faulted-in) while the guest memory ranges are private. > > > > 'faulted-in' doesn't work for device DMAs (w/o IOPF). > > faulted-in can be replaced with mapped-in for the context of IOMMU > operations. > > > > > and above is based on the assumption that CoCo VM will always > > map/pin the private memory pages until a conversion happens. > > Host physical memory is pinned by the host software stack. If you are > talking about arch specific logic in KVM, then the expectation again > is that guest_memfd will give pinned memory to it's users. sorry it's a typo. I meant the host does it for CoCo VM. > > > > > Conversion is initiated by the guest so ideally the guest is responsible > > for not leaving any in-fly DMAs to the page which is being converted. > > From this angle it is fine for IOMMUFD to receive a notification from > > guest memfd when such a conversion happens. > > > > But I'm not sure whether the TDX way is architectural or just an > > implementation choice which could be changed later, or whether it > > applies to other arch. > > All private memory accesses from TDX VMs go via Secure EPT. If host > removes secure EPT entries without guest intervention then linux guest > has a logic to generate a panic when it encounters EPT violation on > private memory accesses [1]. Yeah, that sounds good. > > > > > If that behavior cannot be guaranteed, then we may still need a way > > for IOMMUFD to request long term pin. > > [1] > https://elixir.bootlin.com/linux/v6.11/source/arch/x86/coco/tdx/tdx.c#L677