Re: [PATCH RFC 05/12] iommufd: PFN handling for iopt_pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2022-03-18 at 14:27 -0300, Jason Gunthorpe wrote:
> The top of the data structure provides an IO Address Space (IOAS) that is
> similar to a VFIO container. The IOAS allows map/unmap of memory into
> ranges of IOVA called iopt_areas. Domains and in-kernel users (like VFIO
> mdevs) can be attached to the IOAS to access the PFNs that those IOVA
> areas cover.
> 
> The IO Address Space (IOAS) datastructure is composed of:
>  - struct io_pagetable holding the IOVA map
>  - struct iopt_areas representing populated portions of IOVA
>  - struct iopt_pages representing the storage of PFNs
>  - struct iommu_domain representing the IO page table in the system IOMMU
>  - struct iopt_pages_user representing in-kernel users of PFNs (ie VFIO
>    mdevs)
>  - struct xarray pinned_pfns holding a list of pages pinned by in-kernel
>    users
> 
> This patch introduces the lowest part of the datastructure - the movement
> of PFNs in a tiered storage scheme:
>  1) iopt_pages::pinned_pfns xarray
>  2) An iommu_domain
>  3) The origin of the PFNs, i.e. the userspace pointer
> 
> PFN have to be copied between all combinations of tiers, depending on the
> configuration.
> 
> The interface is an iterator called a 'pfn_reader' which determines which
> tier each PFN is stored and loads it into a list of PFNs held in a struct
> pfn_batch.
> 
> Each step of the iterator will fill up the pfn_batch, then the caller can
> use the pfn_batch to send the PFNs to the required destination. Repeating
> this loop will read all the PFNs in an IOVA range.
> 
> The pfn_reader and pfn_batch also keep track of the pinned page accounting.
> 
> While PFNs are always stored and accessed as full PAGE_SIZE units the
> iommu_domain tier can store with a sub-page offset/length to support
> IOMMUs with a smaller IOPTE size than PAGE_SIZE.
> 
> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> ---
>  drivers/iommu/iommufd/Makefile          |   3 +-
>  drivers/iommu/iommufd/io_pagetable.h    | 101 ++++
>  drivers/iommu/iommufd/iommufd_private.h |  20 +
>  drivers/iommu/iommufd/pages.c           | 723 ++++++++++++++++++++++++
>  4 files changed, 846 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/iommu/iommufd/io_pagetable.h
>  create mode 100644 drivers/iommu/iommufd/pages.c
> 
> 
---8<---
> +
> +/*
> + * This holds a pinned page list for multiple areas of IO address space. The
> + * pages always originate from a linear chunk of userspace VA. Multiple
> + * io_pagetable's, through their iopt_area's, can share a single iopt_pages
> + * which avoids multi-pinning and double accounting of page consumption.
> + *
> + * indexes in this structure are measured in PAGE_SIZE units, are 0 based from
> + * the start of the uptr and extend to npages. pages are pinned dynamically
> + * according to the intervals in the users_itree and domains_itree, npages
> + * records the current number of pages pinned.

This sounds wrong or at least badly named. If npages records the
current number of pages pinned then what does npinned record?

> + */
> +struct iopt_pages {
> +	struct kref kref;
> +	struct mutex mutex;
> +	size_t npages;
> +	size_t npinned;
> +	size_t last_npinned;
> +	struct task_struct *source_task;
> +	struct mm_struct *source_mm;
> +	struct user_struct *source_user;
> +	void __user *uptr;
> +	bool writable:1;
> +	bool has_cap_ipc_lock:1;
> +
> +	struct xarray pinned_pfns;
> +	/* Of iopt_pages_user::node */
> +	struct rb_root_cached users_itree;
> +	/* Of iopt_area::pages_node */
> +	struct rb_root_cached domains_itree;
> +};
> +
---8<---




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux