On Mon, Jul 15, 2024 at 10:27 PM Steven Sistare <steven.sistare@xxxxxxxxxx> wrote: > > On 7/14/2024 10:26 PM, Jason Wang wrote: > > On Fri, Jul 12, 2024 at 9:19 PM Steve Sistare <steven.sistare@xxxxxxxxxx> wrote: > >> > >> Add an ioctl to transfer file descriptor ownership and pinned memory > >> accounting from one process to another. > >> > >> This is more efficient than VHOST_RESET_OWNER followed by VHOST_SET_OWNER, > >> as that would unpin all physical pages, requiring them to be repinned in > >> the new process. That would cost multiple seconds for large memories, and > >> be incurred during a virtual machine's pause time during live update. > >> > >> Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx> > >> --- > >> drivers/vhost/vdpa.c | 41 ++++++++++++++++++++++++++++++++++++++ > >> drivers/vhost/vhost.c | 15 ++++++++++++++ > >> drivers/vhost/vhost.h | 1 + > >> include/uapi/linux/vhost.h | 10 ++++++++++ > >> 4 files changed, 67 insertions(+) > >> > >> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c > >> index b49e5831b3f0..5cf55ca4ec02 100644 > >> --- a/drivers/vhost/vdpa.c > >> +++ b/drivers/vhost/vdpa.c > >> @@ -632,6 +632,44 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v) > >> return ret; > >> } > >> > >> +static long vhost_vdpa_new_owner(struct vhost_vdpa *v) > >> +{ > >> + int r; > >> + struct vhost_dev *vdev = &v->vdev; > >> + struct mm_struct *mm_old = vdev->mm; > >> + struct mm_struct *mm_new = current->mm; > >> + long pinned_vm = v->pinned_vm; > >> + unsigned long lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK)); > >> + > >> + if (!mm_old) > >> + return -EINVAL; > >> + mmgrab(mm_old); > >> + > >> + if (!v->vdpa->use_va && > >> + pinned_vm + atomic64_read(&mm_new->pinned_vm) > lock_limit) { > >> + r = -ENOMEM; > >> + goto out; > >> + } > > > > So this seems to allow an arbitrary process to execute this. Seems to be unsafe. > > > > I wonder if we need to add some checks here, maybe PID or other stuff > > to only allow the owner process to do this. > > The original owner must send the file descriptor to the new owner. This seems not to be in the steps you put in the cover letter. > That constitutes permission to take ownership. This seems like a relaxed version of the reset_owner: Currently, reset_owner have the following check: /* Caller should have device mutex */ long vhost_dev_check_owner(struct vhost_dev *dev) { /* Are you the owner? If not, I don't think you mean to do that */ return dev->mm == current->mm ? 0 : -EPERM; } EXPORT_SYMBOL_GPL(vhost_dev_check_owner); It means even if the fd is passed to some other process, the reset owner won't work there. Thanks > > >> + r = vhost_vdpa_bind_mm(v, mm_new); > >> + if (r) > >> + goto out; > >> + > >> + r = vhost_dev_new_owner(vdev); > >> + if (r) { > >> + vhost_vdpa_bind_mm(v, mm_old); > >> + goto out; > >> + } > >> + > >> + if (!v->vdpa->use_va) { > >> + atomic64_sub(pinned_vm, &mm_old->pinned_vm); > >> + atomic64_add(pinned_vm, &mm_new->pinned_vm); > >> + } > >> + > >> +out: > >> + mmdrop(mm_old); > >> + return r; > >> +} > >> + > >> static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd, > >> void __user *argp) > >> { > >> @@ -876,6 +914,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep, > >> case VHOST_VDPA_RESUME: > >> r = vhost_vdpa_resume(v); > >> break; > >> + case VHOST_NEW_OWNER: > >> + r = vhost_vdpa_new_owner(v); > >> + break; > >> default: > >> r = vhost_dev_ioctl(&v->vdev, cmd, argp); > >> if (r == -ENOIOCTLCMD) > >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > >> index b60955682474..ab40ae50552f 100644 > >> --- a/drivers/vhost/vhost.c > >> +++ b/drivers/vhost/vhost.c > >> @@ -963,6 +963,21 @@ long vhost_dev_set_owner(struct vhost_dev *dev) > >> } > >> EXPORT_SYMBOL_GPL(vhost_dev_set_owner); > >> > >> +/* Caller should have device mutex */ > >> +long vhost_dev_new_owner(struct vhost_dev *dev) > >> +{ > >> + if (dev->mm == current->mm) > >> + return -EBUSY; > >> + > >> + if (!vhost_dev_has_owner(dev)) > >> + return -EINVAL; > >> + > >> + vhost_detach_mm(dev); > >> + vhost_attach_mm(dev); > > > > This seems to do nothing unless I miss something. > > vhost_detach mm drops dev->mm. > vhost_attach_mm grabs current->mm. > > - Steve >