> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > Sent: Tuesday, March 31, 2020 9:22 PM > > > From: Tian, Kevin <kevin.tian@xxxxxxxxx> > > Sent: Tuesday, March 31, 2020 1:41 PM > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>; alex.williamson@xxxxxxxxxx; > > eric.auger@xxxxxxxxxx > > Subject: RE: [PATCH v1 1/8] vfio: Add > VFIO_IOMMU_PASID_REQUEST(alloc/free) > > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > > > Sent: Monday, March 30, 2020 10:37 PM > > > > > > > From: Tian, Kevin <kevin.tian@xxxxxxxxx> > > > > Sent: Monday, March 30, 2020 4:32 PM > > > > To: Liu, Yi L <yi.l.liu@xxxxxxxxx>; alex.williamson@xxxxxxxxxx; > > > > Subject: RE: [PATCH v1 1/8] vfio: Add > > > VFIO_IOMMU_PASID_REQUEST(alloc/free) > > > > > > > > > From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > > > > > Sent: Sunday, March 22, 2020 8:32 PM > > > > > > > > > > From: Liu Yi L <yi.l.liu@xxxxxxxxx> > > > > > > > > > > For a long time, devices have only one DMA address space from > > > > > platform IOMMU's point of view. This is true for both bare metal > > > > > and directed- access in virtualization environment. Reason is the > > > > > source ID of DMA in PCIe are BDF (bus/dev/fnc ID), which results > > > > > in only device granularity > > > > > > > > are->is > > > > > > thanks. > > > > > > > > DMA isolation. However, this is changing with the latest > > > > > advancement in I/O technology area. More and more platform > vendors > > > > > are utilizing the > > > PCIe > > > > > PASID TLP prefix in DMA requests, thus to give devices with > > > > > multiple DMA address spaces as identified by their individual > > > > > PASIDs. For example, Shared Virtual Addressing (SVA, a.k.a Shared > > > > > Virtual Memory) is able to let device access multiple process > > > > > virtual address space by binding the > > > > > > > > "address space" -> "address spaces" > > > > > > > > "binding the" -> "binding each" > > > > > > will correct both. > > > > > > > > virtual address space with a PASID. Wherein the PASID is allocated > > > > > in software and programmed to device per device specific manner. > > > > > Devices which support PASID capability are called PASID-capable > > > > > devices. If such devices are passed through to VMs, guest software > > > > > are also able to bind guest process virtual address space on such > > > > > devices. Therefore, the guest software could reuse the bare metal > > > > > software programming model, > > > which > > > > > means guest software will also allocate PASID and program it to > > > > > device directly. This is a dangerous situation since it has > > > > > potential PASID conflicts and unauthorized address space access. > > > > > It would be safer to let host intercept in the guest software's > > > > > PASID allocation. Thus PASID are managed system-wide. > > > > > > > > > > This patch adds VFIO_IOMMU_PASID_REQUEST ioctl which aims to > > > > > passdown PASID allocation/free request from the virtual IOMMU. > > > > > Additionally, such > > > > > > > > "Additionally, because such" > > > > > > > > > requests are intended to be invoked by QEMU or other applications > > > which > > > > > > > > simplify to "intended to be invoked from userspace" > > > > > > got it. > > > > > > > > are running in userspace, it is necessary to have a mechanism to > > > > > prevent single application from abusing available PASIDs in > > > > > system. With such consideration, this patch tracks the VFIO PASID > > > > > allocation per-VM. There was a discussion to make quota to be per > > > > > assigned devices. e.g. if a VM has many assigned devices, then it > > > > > should have more quota. However, it is not sure how many PASIDs an > > > > > assigned devices will use. e.g. it is > > > > > > > > devices -> device > > > > > > got it. > > > > > > > > possible that a VM with multiples assigned devices but requests > > > > > less PASIDs. Therefore per-VM quota would be better. > > > > > > > > > > This patch uses struct mm pointer as a per-VM token. We also > > > > > considered using task structure pointer and vfio_iommu structure > > > > > pointer. However, task structure is per-thread, which means it > > > > > cannot achieve per-VM PASID alloc tracking purpose. While for > > > > > vfio_iommu structure, it is visible only within vfio. Therefore, > > > > > structure mm pointer is selected. This patch adds a structure > > > > > vfio_mm. A vfio_mm is created when the first vfio container is > > > > > opened by a VM. On the reverse order, vfio_mm is free when the > > > > > last vfio container is released. Each VM is assigned with a PASID > > > > > quota, so that it is not able to request PASID beyond its quota. > > > > > This patch adds a default quota of 1000. This quota could be tuned > > > > > by administrator. Making PASID quota tunable will be added in > > > > > another > > > patch > > > > > in this series. > > > > > > > > > > Previous discussions: > > > > > https://patchwork.kernel.org/patch/11209429/ > > > > > > > > > > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > > > > > CC: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > > > > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > > > > > Cc: Eric Auger <eric.auger@xxxxxxxxxx> > > > > > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > > > > > Signed-off-by: Liu Yi L <yi.l.liu@xxxxxxxxx> > > > > > Signed-off-by: Yi Sun <yi.y.sun@xxxxxxxxxxxxxxx> > > > > > Signed-off-by: Jacob Pan <jacob.jun.pan@xxxxxxxxxxxxxxx> > > > > > --- > > > > > drivers/vfio/vfio.c | 130 > > > > > ++++++++++++++++++++++++++++++++++++++++ > > > > > drivers/vfio/vfio_iommu_type1.c | 104 > > > > > ++++++++++++++++++++++++++++++++ > > > > > include/linux/vfio.h | 20 +++++++ > > > > > include/uapi/linux/vfio.h | 41 +++++++++++++ > > > > > 4 files changed, 295 insertions(+) > > > > > > > > > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index > > > > > c848262..d13b483 100644 > > > > > --- a/drivers/vfio/vfio.c > > > > > +++ b/drivers/vfio/vfio.c > > > > > @@ -32,6 +32,7 @@ > > > > > #include <linux/vfio.h> > > > > > #include <linux/wait.h> > > > > > #include <linux/sched/signal.h> > > > > > +#include <linux/sched/mm.h> > > > > > > > > > > #define DRIVER_VERSION "0.3" > > > > > #define DRIVER_AUTHOR "Alex Williamson > > > > > <alex.williamson@xxxxxxxxxx>" > > > > > @@ -46,6 +47,8 @@ static struct vfio { > > > > > struct mutex group_lock; > > > > > struct cdev group_cdev; > > > > > dev_t group_devt; > > > > > + struct list_head vfio_mm_list; > > > > > + struct mutex vfio_mm_lock; > > > > > wait_queue_head_t release_q; > > > > > } vfio; > > > > > > > > > > @@ -2129,6 +2132,131 @@ int vfio_unregister_notifier(struct device > > > *dev, > > > > > enum vfio_notify_type type, > > > > > EXPORT_SYMBOL(vfio_unregister_notifier); > > > > > > > > > > /** > > > > > + * VFIO_MM objects - create, release, get, put, search > > > > > > > > why capitalizing vfio_mm? > > > > > > oops, it's not intended, will fix it. > > > > > > > > + * Caller of the function should have held vfio.vfio_mm_lock. > > > > > + */ > > > > > +static struct vfio_mm *vfio_create_mm(struct mm_struct *mm) { > > > > > + struct vfio_mm *vmm; > > > > > + struct vfio_mm_token *token; > > > > > + int ret = 0; > > > > > + > > > > > + vmm = kzalloc(sizeof(*vmm), GFP_KERNEL); > > > > > + if (!vmm) > > > > > + return ERR_PTR(-ENOMEM); > > > > > + > > > > > + /* Per mm IOASID set used for quota control and group > operations > > > > > */ > > > > > + ret = ioasid_alloc_set((struct ioasid_set *) mm, > > > > > + VFIO_DEFAULT_PASID_QUOTA, &vmm- > > > > > >ioasid_sid); > > > > > + if (ret) { > > > > > + kfree(vmm); > > > > > + return ERR_PTR(ret); > > > > > + } > > > > > + > > > > > + kref_init(&vmm->kref); > > > > > + token = &vmm->token; > > > > > + token->val = mm; > > > > > + vmm->pasid_quota = VFIO_DEFAULT_PASID_QUOTA; > > > > > + mutex_init(&vmm->pasid_lock); > > > > > + > > > > > + list_add(&vmm->vfio_next, &vfio.vfio_mm_list); > > > > > + > > > > > + return vmm; > > > > > +} > > > > > + > > > > > +static void vfio_mm_unlock_and_free(struct vfio_mm *vmm) { > > > > > + /* destroy the ioasid set */ > > > > > + ioasid_free_set(vmm->ioasid_sid, true); > > > > > > > > do we need hold pasid lock here, since it attempts to destroy a set > > > > which might be referenced by vfio_mm_pasid_free? or is there > > > > guarantee that such race won't happen? > > > > > > Emmm, if considering the race between ioasid_free_set and > > > vfio_mm_pasid_free, I guess ioasid core should sequence the two > > > operations with its internal lock. right? > > > > I looked at below code in free path: > > > > + mutex_lock(&vmm->pasid_lock); > > + pdata = ioasid_find(vmm->ioasid_sid, pasid, NULL); > > + if (IS_ERR(pdata)) { > > + ret = PTR_ERR(pdata); > > + goto out_unlock; > > + } > > + ioasid_free(pasid); > > + > > +out_unlock: > > + mutex_unlock(&vmm->pasid_lock); > > > > above implies that ioasid_find/free must be paired within the pasid_lock. > > Then if we don't hold pasid_lock above, ioasid_free_set could happen > between > > find/free. I'm not sure whether this race would lead to real problem, but it > doesn't > > look correct simply by looking at this file. > > Well, Jacob told me to remove the ioasid_find in another email as he > believes ioasid core should be able to take care of it. and also need to > be protected by lock. If so, does it look good? :-) > > " [Jacob Pan] this might be better to put under ioasid code such that it > is under the ioasid lock. no one else can free the ioasid between find() and > free(). > e.g. ioasid_free(sid, pasid) > if sid == INVALID_IOASID_SET, then no set ownership checking. > thoughts?" yes, that way looks better. > > > > > > > > > + mutex_unlock(&vfio.vfio_mm_lock); > > > > > + kfree(vmm); > > > > > +} > > > > > + > > > > > +/* called with vfio.vfio_mm_lock held */ static void > > > > > +vfio_mm_release(struct kref *kref) { > > > > > + struct vfio_mm *vmm = container_of(kref, struct vfio_mm, > kref); > > > > > + > > > > > + list_del(&vmm->vfio_next); > > > > > + vfio_mm_unlock_and_free(vmm); > > > > > +} > > > > > + > > > > > +void vfio_mm_put(struct vfio_mm *vmm) { > > > > > + kref_put_mutex(&vmm->kref, vfio_mm_release, > > > > > &vfio.vfio_mm_lock); > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(vfio_mm_put); > > > > > + > > > > > +/* Assume vfio_mm_lock or vfio_mm reference is held */ static > > > > > +void vfio_mm_get(struct vfio_mm *vmm) { > > > > > + kref_get(&vmm->kref); > > > > > +} > > > > > + > > > > > +struct vfio_mm *vfio_mm_get_from_task(struct task_struct *task) { > > > > > + struct mm_struct *mm = get_task_mm(task); > > > > > + struct vfio_mm *vmm; > > > > > + unsigned long long val = (unsigned long long) mm; > > > > > + > > > > > + mutex_lock(&vfio.vfio_mm_lock); > > > > > + list_for_each_entry(vmm, &vfio.vfio_mm_list, vfio_next) { > > > > > + if (vmm->token.val == val) { > > > > > + vfio_mm_get(vmm); > > > > > + goto out; > > > > > + } > > > > > + } > > > > > + > > > > > + vmm = vfio_create_mm(mm); > > > > > + if (IS_ERR(vmm)) > > > > > + vmm = NULL; > > > > > +out: > > > > > + mutex_unlock(&vfio.vfio_mm_lock); > > > > > + mmput(mm); > > > > > > > > I assume this has been discussed before, but from readability p.o.v > > > > it might be good to add a comment for this function to explain how > > > > the recording of mm in vfio_mm can be correctly removed when the > mm > > > > is being destroyed, since we don't hold a reference of mm here. > > > > > > yeah, I'll add it. > > > > > > > > + return vmm; > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(vfio_mm_get_from_task); > > > > > + > > > > > +int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int max) { > > > > > + ioasid_t pasid; > > > > > + int ret = -ENOSPC; > > > > > + > > > > > + mutex_lock(&vmm->pasid_lock); > > > > > + > > > > > + pasid = ioasid_alloc(vmm->ioasid_sid, min, max, NULL); > > > > > + if (pasid == INVALID_IOASID) { > > > > > + ret = -ENOSPC; > > > > > + goto out_unlock; > > > > > + } > > > > > + > > > > > + ret = pasid; > > > > > +out_unlock: > > > > > + mutex_unlock(&vmm->pasid_lock); > > > > > + return ret; > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(vfio_mm_pasid_alloc); > > > > > + > > > > > +int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t pasid) { > > > > > + void *pdata; > > > > > + int ret = 0; > > > > > + > > > > > + mutex_lock(&vmm->pasid_lock); > > > > > + pdata = ioasid_find(vmm->ioasid_sid, pasid, NULL); > > > > > + if (IS_ERR(pdata)) { > > > > > + ret = PTR_ERR(pdata); > > > > > + goto out_unlock; > > > > > + } > > > > > + ioasid_free(pasid); > > > > > + > > > > > +out_unlock: > > > > > + mutex_unlock(&vmm->pasid_lock); > > > > > + return ret; > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(vfio_mm_pasid_free); > > > > > + > > > > > +/** > > > > > * Module/class support > > > > > */ > > > > > static char *vfio_devnode(struct device *dev, umode_t *mode) @@ > > > > > -2151,8 +2279,10 @@ static int __init vfio_init(void) > > > > > idr_init(&vfio.group_idr); > > > > > mutex_init(&vfio.group_lock); > > > > > mutex_init(&vfio.iommu_drivers_lock); > > > > > + mutex_init(&vfio.vfio_mm_lock); > > > > > INIT_LIST_HEAD(&vfio.group_list); > > > > > INIT_LIST_HEAD(&vfio.iommu_drivers_list); > > > > > + INIT_LIST_HEAD(&vfio.vfio_mm_list); > > > > > init_waitqueue_head(&vfio.release_q); > > > > > > > > > > ret = misc_register(&vfio_dev); > > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c > > > > > b/drivers/vfio/vfio_iommu_type1.c index a177bf2..331ceee 100644 > > > > > --- a/drivers/vfio/vfio_iommu_type1.c > > > > > +++ b/drivers/vfio/vfio_iommu_type1.c > > > > > @@ -70,6 +70,7 @@ struct vfio_iommu { > > > > > unsigned int dma_avail; > > > > > bool v2; > > > > > bool nesting; > > > > > + struct vfio_mm *vmm; > > > > > }; > > > > > > > > > > struct vfio_domain { > > > > > @@ -2018,6 +2019,7 @@ static void > > > vfio_iommu_type1_detach_group(void > > > > > *iommu_data, > > > > > static void *vfio_iommu_type1_open(unsigned long arg) { > > > > > struct vfio_iommu *iommu; > > > > > + struct vfio_mm *vmm = NULL; > > > > > > > > > > iommu = kzalloc(sizeof(*iommu), GFP_KERNEL); > > > > > if (!iommu) > > > > > @@ -2043,6 +2045,10 @@ static void > > > *vfio_iommu_type1_open(unsigned > > > > > long arg) > > > > > iommu->dma_avail = dma_entry_limit; > > > > > mutex_init(&iommu->lock); > > > > > BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier); > > > > > + vmm = vfio_mm_get_from_task(current); > > > > > + if (!vmm) > > > > > + pr_err("Failed to get vfio_mm track\n"); > > > > > > > > I assume error should be returned when pr_err is used... > > > > > > got it. I didn't do it as I don’t think vfio_mm is necessary for > > > every iommu open. It is necessary for the nesting type iommu. I'll > > > make it fetch vmm when it is opening nesting type and return error > > > if failed. > > > > sounds good. > > > > > > > > > > + iommu->vmm = vmm; > > > > > > > > > > return iommu; > > > > > } > > > > > @@ -2084,6 +2090,8 @@ static void vfio_iommu_type1_release(void > > > > > *iommu_data) > > > > > } > > > > > > > > > > vfio_iommu_iova_free(&iommu->iova_list); > > > > > + if (iommu->vmm) > > > > > + vfio_mm_put(iommu->vmm); > > > > > > > > > > kfree(iommu); > > > > > } > > > > > @@ -2172,6 +2180,55 @@ static int > vfio_iommu_iova_build_caps(struct > > > > > vfio_iommu *iommu, > > > > > return ret; > > > > > } > > > > > > > > > > +static bool vfio_iommu_type1_pasid_req_valid(u32 flags) > > > > > > > > I don't think you need prefix "vfio_iommu_type1" for every new > > > > function here, especially for leaf internal function as this one. > > > > > > got it. thanks. > > > > > > > > +{ > > > > > + return !((flags & ~VFIO_PASID_REQUEST_MASK) || > > > > > + (flags & VFIO_IOMMU_PASID_ALLOC && > > > > > + flags & VFIO_IOMMU_PASID_FREE)); > > > > > +} > > > > > + > > > > > +static int vfio_iommu_type1_pasid_alloc(struct vfio_iommu *iommu, > > > > > + int min, > > > > > + int max) > > > > > +{ > > > > > + struct vfio_mm *vmm = iommu->vmm; > > > > > + int ret = 0; > > > > > + > > > > > + mutex_lock(&iommu->lock); > > > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > > > > + ret = -EFAULT; > > > > > > > > why -EFAULT? > > > > > > well, it's from a prior comment as below: > > > vfio_mm_pasid_alloc() can return -ENOSPC though, so it'd be nice to > > > differentiate the errors. We could use EFAULT for the no IOMMU case > > > and EINVAL here? > > > http://lkml.iu.edu/hypermail/linux/kernel/2001.3/05964.html > > > > > > > > > > > > + goto out_unlock; > > > > > + } > > > > > + if (vmm) > > > > > + ret = vfio_mm_pasid_alloc(vmm, min, max); > > > > > + else > > > > > + ret = -EINVAL; > > > > > +out_unlock: > > > > > + mutex_unlock(&iommu->lock); > > > > > + return ret; > > > > > +} > > > > > + > > > > > +static int vfio_iommu_type1_pasid_free(struct vfio_iommu *iommu, > > > > > + unsigned int pasid) > > > > > +{ > > > > > + struct vfio_mm *vmm = iommu->vmm; > > > > > + int ret = 0; > > > > > + > > > > > + mutex_lock(&iommu->lock); > > > > > + if (!IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)) { > > > > > + ret = -EFAULT; > > > > > > > > ditto > > > > > > > > > + goto out_unlock; > > > > > + } > > > > > + > > > > > + if (vmm) > > > > > + ret = vfio_mm_pasid_free(vmm, pasid); > > > > > + else > > > > > + ret = -EINVAL; > > > > > +out_unlock: > > > > > + mutex_unlock(&iommu->lock); > > > > > + return ret; > > > > > +} > > > > > + > > > > > static long vfio_iommu_type1_ioctl(void *iommu_data, > > > > > unsigned int cmd, unsigned long arg) > > > > > { > > > > > @@ -2276,6 +2333,53 @@ static long vfio_iommu_type1_ioctl(void > > > > > *iommu_data, > > > > > > > > > > return copy_to_user((void __user *)arg, &unmap, minsz) ? > > > > > -EFAULT : 0; > > > > > + > > > > > + } else if (cmd == VFIO_IOMMU_PASID_REQUEST) { > > > > > + struct vfio_iommu_type1_pasid_request req; > > > > > + unsigned long offset; > > > > > + > > > > > + minsz = offsetofend(struct > vfio_iommu_type1_pasid_request, > > > > > + flags); > > > > > + > > > > > + if (copy_from_user(&req, (void __user *)arg, minsz)) > > > > > + return -EFAULT; > > > > > + > > > > > + if (req.argsz < minsz || > > > > > + !vfio_iommu_type1_pasid_req_valid(req.flags)) > > > > > + return -EINVAL; > > > > > + > > > > > + if (copy_from_user((void *)&req + minsz, > > > > > + (void __user *)arg + minsz, > > > > > + sizeof(req) - minsz)) > > > > > + return -EFAULT; > > > > > > > > why copying in two steps instead of copying them together? > > > > > > just want to do sanity check before copying all the data. I > > > can move it as one copy if it's better. :-) > > > > it's possible fine. I just saw you did same thing for other uapis. > > > > > > > > > > + > > > > > + switch (req.flags & VFIO_PASID_REQUEST_MASK) { > > > > > + case VFIO_IOMMU_PASID_ALLOC: > > > > > + { > > > > > + int ret = 0, result; > > > > > + > > > > > + result = > vfio_iommu_type1_pasid_alloc(iommu, > > > > > + > req.alloc_pasid.min, > > > > > + > req.alloc_pasid.max); > > > > > + if (result > 0) { > > > > > + offset = offsetof( > > > > > + struct > > > > > vfio_iommu_type1_pasid_request, > > > > > + alloc_pasid.result); > > > > > + ret = copy_to_user( > > > > > + (void __user *) (arg + > offset), > > > > > + &result, sizeof(result)); > > > > > + } else { > > > > > + pr_debug("%s: PASID alloc failed\n", > > > > > __func__); > > > > > + ret = -EFAULT; > > > > > > > > no, this branch is not for copy_to_user error. it is about pasid alloc > > > > failure. you should handle both. > > > > > > Emmm, I just want to fail the IOCTL in such case, so the @result field > > > is meaningless in the user side. How about using another return value > > > (e.g. ENOSPC) to indicate the IOCTL failure? > > > > If pasid_alloc fails, you return its result to userspace > > if copy_to_user fails, then return -EFAULT. > > > > however, above you return -EFAULT for pasid_alloc failure, and > > then the number of not-copied bytes for copy_to_user. > > not quite get. Let me re-paste the code. :-) > > + case VFIO_IOMMU_PASID_ALLOC: > + { > + int ret = 0, result; > + > + result = vfio_iommu_type1_pasid_alloc(iommu, > + req.alloc_pasid.min, > + req.alloc_pasid.max); > + if (result > 0) { > + offset = offsetof( > + struct > vfio_iommu_type1_pasid_request, > + alloc_pasid.result); > + ret = copy_to_user( > + (void __user *) (arg + offset), > + &result, sizeof(result)); > if copy_to_user failed, ret is the number of uncopied bytes and > will be returned to userspace to indicate failure. userspace will > not use the data in result field. perhaps, I should check the ret > here and return -EFAULT or another errno, instead of return the > number of un-copied bytes. here should return -EFAULT. > + } else { > + pr_debug("%s: PASID alloc failed\n", > __func__); > + ret = -EFAULT; > if vfio_iommu_type1_pasid_alloc() failed, no doubt, return -EFAULT > to userspace to indicate failure. pasid_alloc has its own error types returned. why blindly replace it with -EFAULT? > + } > + return ret; > + } > > is there still porblem here? > > > > > > > > + } > > > > > + return ret; > > > > > + } > > > > > + case VFIO_IOMMU_PASID_FREE: > > > > > + return vfio_iommu_type1_pasid_free(iommu, > > > > > + > req.free_pasid); > > > > > + default: > > > > > + return -EINVAL; > > > > > + } > > > > > } > > > > > > > > > > return -ENOTTY; > > > > > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > > > > > index e42a711..75f9f7f1 100644 > > > > > --- a/include/linux/vfio.h > > > > > +++ b/include/linux/vfio.h > > > > > @@ -89,6 +89,26 @@ extern int vfio_register_iommu_driver(const > struct > > > > > vfio_iommu_driver_ops *ops); > > > > > extern void vfio_unregister_iommu_driver( > > > > > const struct vfio_iommu_driver_ops *ops); > > > > > > > > > > +#define VFIO_DEFAULT_PASID_QUOTA 1000 > > > > > +struct vfio_mm_token { > > > > > + unsigned long long val; > > > > > +}; > > > > > + > > > > > +struct vfio_mm { > > > > > + struct kref kref; > > > > > + struct vfio_mm_token token; > > > > > + int ioasid_sid; > > > > > + /* protect @pasid_quota field and pasid allocation/free */ > > > > > + struct mutex pasid_lock; > > > > > + int pasid_quota; > > > > > + struct list_head vfio_next; > > > > > +}; > > > > > + > > > > > +extern struct vfio_mm *vfio_mm_get_from_task(struct task_struct > > > *task); > > > > > +extern void vfio_mm_put(struct vfio_mm *vmm); > > > > > +extern int vfio_mm_pasid_alloc(struct vfio_mm *vmm, int min, int > max); > > > > > +extern int vfio_mm_pasid_free(struct vfio_mm *vmm, ioasid_t > pasid); > > > > > + > > > > > /* > > > > > * External user API > > > > > */ > > > > > diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h > > > > > index 9e843a1..298ac80 100644 > > > > > --- a/include/uapi/linux/vfio.h > > > > > +++ b/include/uapi/linux/vfio.h > > > > > @@ -794,6 +794,47 @@ struct vfio_iommu_type1_dma_unmap { > > > > > #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15) > > > > > #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16) > > > > > > > > > > +/* > > > > > + * PASID (Process Address Space ID) is a PCIe concept which > > > > > + * has been extended to support DMA isolation in fine-grain. > > > > > + * With device assigned to user space (e.g. VMs), PASID alloc > > > > > + * and free need to be system wide. This structure defines > > > > > + * the info for pasid alloc/free between user space and kernel > > > > > + * space. > > > > > + * > > > > > + * @flag=VFIO_IOMMU_PASID_ALLOC, refer to the @alloc_pasid > > > > > + * @flag=VFIO_IOMMU_PASID_FREE, refer to @free_pasid > > > > > + */ > > > > > +struct vfio_iommu_type1_pasid_request { > > > > > + __u32 argsz; > > > > > +#define VFIO_IOMMU_PASID_ALLOC (1 << 0) > > > > > +#define VFIO_IOMMU_PASID_FREE (1 << 1) > > > > > + __u32 flags; > > > > > + union { > > > > > + struct { > > > > > + __u32 min; > > > > > + __u32 max; > > > > > + __u32 result; > > > > > > > > result->pasid? > > > > > > yes, the pasid allocated. > > > > > > > > > > > > + } alloc_pasid; > > > > > + __u32 free_pasid; > > > > > > > > what about putting a common pasid field after flags? > > > > > > looks good to me. But it would make the union part only meaningful > > > to alloc pasid. if so, maybe make the union part as a data field and > > > only alloc pasid will have it. For free pasid, it is not necessary > > > to read it from userspace. does it look good? > > > > maybe keeping the union is also OK, just with {min, max} for alloc. > > who knows whether more pasid ops will be added in the future > > which may require its specific union structure. ?? putting pasid > > as a common field is reasonable because the whole cmd is for > > pasid. > > got it. > > Thanks, > Yi Liu