On Tue, Apr 30, 2024 at 04:24:50PM -0600, Alex Williamson wrote: > > +static vm_fault_t vfio_pci_dma_buf_fault(struct vm_fault *vmf) > > +{ > > + struct vm_area_struct *vma = vmf->vma; > > + struct vfio_pci_dma_buf *priv = vma->vm_private_data; > > + pgoff_t pgoff = vmf->pgoff; > > + > > + if (pgoff >= priv->nr_pages) > > + return VM_FAULT_SIGBUS; > > + > > + return vmf_insert_pfn(vma, vmf->address, > > + page_to_pfn(priv->pages[pgoff])); > > +} > > How does this prevent the MMIO space from being mmap'd when disabled at > the device? How is the mmap revoked when the MMIO becomes disabled? > Is it part of the move protocol? Yes, we should not have a mmap handler for dmabuf. vfio memory must be mmapped in the normal way. > > +static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) > > +{ > > + struct vfio_pci_dma_buf *priv = dmabuf->priv; > > + > > + /* > > + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. > > + * The refcount prevents both. > > + */ > > + if (priv->vdev) { > > + release_p2p_pages(priv, priv->nr_pages); > > + kfree(priv->pages); > > + down_write(&priv->vdev->memory_lock); > > + list_del_init(&priv->dmabufs_elm); > > + up_write(&priv->vdev->memory_lock); > > Why are we acquiring and releasing the memory_lock write lock > throughout when we're not modifying the device memory enable state? > Ugh, we're using it to implicitly lock dmabufs_elm/dmabufs aren't we... Not really implicitly, but yes the dmabufs list is locked by the memory_lock. > > +int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, > > + struct vfio_device_feature_dma_buf __user *arg, > > + size_t argsz) > > +{ > > + struct vfio_device_feature_dma_buf get_dma_buf; > > + struct vfio_region_p2p_area *p2p_areas; > > + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); > > + struct vfio_pci_dma_buf *priv; > > + int i, ret; > > + > > + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, > > + sizeof(get_dma_buf)); > > + if (ret != 1) > > + return ret; > > + > > + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) > > + return -EFAULT; > > + > > + p2p_areas = memdup_array_user(&arg->p2p_areas, > > + get_dma_buf.nr_areas, > > + sizeof(*p2p_areas)); > > + if (IS_ERR(p2p_areas)) > > + return PTR_ERR(p2p_areas); > > + > > + priv = kzalloc(sizeof(*priv), GFP_KERNEL); > > + if (!priv) > > + return -ENOMEM; > > p2p_areas is leaked. What is this new p2p_areas thing? It wasn't in my patch.. > > + exp_info.ops = &vfio_pci_dmabuf_ops; > > + exp_info.size = priv->nr_pages << PAGE_SHIFT; > > + exp_info.flags = get_dma_buf.open_flags; > > open_flags from userspace are unchecked. Huh. That seems to be a dmabuf pattern. :\ > > + exp_info.priv = priv; > > + > > + priv->dmabuf = dma_buf_export(&exp_info); > > + if (IS_ERR(priv->dmabuf)) { > > + ret = PTR_ERR(priv->dmabuf); > > + goto err_free_pages; > > + } > > + > > + /* dma_buf_put() now frees priv */ > > + INIT_LIST_HEAD(&priv->dmabufs_elm); > > + down_write(&vdev->memory_lock); > > + dma_resv_lock(priv->dmabuf->resv, NULL); > > + priv->revoked = !__vfio_pci_memory_enabled(vdev); > > + vfio_device_try_get_registration(&vdev->vdev); > > I guess we're assuming this can't fail in the ioctl path of an open > device? Seems like a bug added here.. My version had this as vfio_device_get(). This stuff has probably changed since I wrote it. > > + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); > > + dma_resv_unlock(priv->dmabuf->resv); > > What was the purpose of locking this? ? > > +void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) > > +{ > > + struct vfio_pci_dma_buf *priv; > > + struct vfio_pci_dma_buf *tmp; > > + > > + lockdep_assert_held_write(&vdev->memory_lock); > > + > > + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { > > + if (!get_file_rcu(&priv->dmabuf->file)) > > + continue; > > Does this indicate the file was closed? Yes.. The original patch was clearer, Christian asked to open code it: + * Returns true if a reference was successfully obtained. The caller must + * interlock with the dmabuf's release function in some way, such as RCU, to + * ensure that this is not called on freed memory. A description of how the locking is working should be put in a comment above that code. > > @@ -623,6 +625,8 @@ static int vfio_basic_config_write(struct vfio_pci_core_device *vdev, int pos, > > *virt_cmd &= cpu_to_le16(~mask); > > *virt_cmd |= cpu_to_le16(new_cmd & mask); > > > > + if (__vfio_pci_memory_enabled(vdev)) > > + vfio_pci_dma_buf_move(vdev, false); > > up_write(&vdev->memory_lock); > > } > > FLR is also accessible through config space. That needs fixing up > > @@ -1246,7 +1248,10 @@ static int vfio_pci_ioctl_reset(struct vfio_pci_core_device *vdev, > > */ > > vfio_pci_set_power_state(vdev, PCI_D0); > > > > + vfio_pci_dma_buf_move(vdev, true); > > ret = pci_try_reset_function(vdev->pdev); > > + if (__vfio_pci_memory_enabled(vdev)) > > + vfio_pci_dma_buf_move(vdev, false); > > up_write(&vdev->memory_lock); > > > > What about runtime power management? Yes Yes, I somehow thing it was added > > -static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, > > - uuid_t __user *arg, size_t argsz) > > +static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev, > > + u32 flags, uuid_t __user *arg, > > + size_t argsz) > > { > > - struct vfio_pci_core_device *vdev = > > - container_of(device, struct vfio_pci_core_device, vdev); > > Why is only this existing function updated? If the core device deref > is common then apply it to all and do so in a separate patch. Thanks, Hm, I think that was som rebasing issue. > > +/** > > + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the > > + * region selected. > > + * > > + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC, > > + * etc. offset/length specify a slice of the region to create the dmabuf from. > > + * If both are 0 then the whole region is used. > > + * > > + * Return: The fd number on success, -1 and errno is set on failure. > > + */ > > +#define VFIO_DEVICE_FEATURE_DMA_BUF 11 > > + > > +struct vfio_region_p2p_area { > > + __u32 region_index; > > + __u32 __pad; > > + __u64 offset; > > + __u64 length; > > +}; > > + > > +struct vfio_device_feature_dma_buf { > > + __u32 open_flags; > > + __u32 nr_areas; > > + struct vfio_region_p2p_area p2p_areas[]; > > +}; Still have no clue what this p2p areas is. You want to create a dmabuf out of a scatterlist? Why?? I'm also not sure of the use of the pci_p2pdma family of functions, it is a bold step to make struct pages, that isn't going to work in quite alot of cases. It is really hacky/wrong to immediately call pci_alloc_p2pmem() to defeat the internal genalloc. I'd rather we stick with the original design. Leon is working on DMA API changes that should address half the issue. Jason