> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> > Sent: Wednesday, August 16, 2023 8:14 PM > > Under nested IOMMU translation, userspace owns the stage-1 translation > table (e.g. the stage-1 page table of Intel VT-d or the context table of > ARM SMMUv3, and etc.). Stage-1 translation tables are vendor specific, and > need to be compatible with the underlying IOMMU hardware. Hence, > userspace > should know the IOMMU hardware capability before creating and > configuring > the stage-1 translation table to kernel. > > This adds IOMMU_GET_HW_INFO ioctl to query the IOMMU hardware > information > (a.k.a capability) for a given device. The returned data is vendor > specific, userspace needs to decode it with the structure by the output > @out_data_type field. "The format of the returned data is vendor specific and must be decoded according to @out_data_type field". > + > +int iommufd_get_hw_info(struct iommufd_ucmd *ucmd) > +{ > + struct iommu_hw_info *cmd = ucmd->cmd; > + void __user *user_ptr = u64_to_user_ptr(cmd->data_uptr); > + const struct iommu_ops *ops; > + struct iommufd_device *idev; > + unsigned int data_len; > + unsigned int copy_len; > + void *data = NULL; > + int rc; > + > + if (cmd->flags || cmd->__reserved) > + return -EOPNOTSUPP; > + > + idev = iommufd_get_device(ucmd, cmd->dev_id); > + if (IS_ERR(idev)) > + return PTR_ERR(idev); > + > + ops = dev_iommu_ops(idev->dev); > + if (ops->hw_info) { > + data = ops->hw_info(idev->dev, &data_len, &cmd- > >out_data_type); > + if (IS_ERR(data)) { > + rc = PTR_ERR(data); > + goto err_put; > + } > + > + /* > + * drivers that have hw_info callback should have a unique > + * iommu_hw_info_type. > + */ > + if (WARN_ON_ONCE(cmd->out_data_type == > + IOMMU_HW_INFO_TYPE_NONE)) { > + rc = -ENODEV; > + goto out; > + } > + } else { > + cmd->out_data_type = IOMMU_HW_INFO_TYPE_NONE; > + data_len = 0; > + data = NULL; data is already initialized as NULL. > + > + /* > + * We return the length the kernel supports so userspace may know > what > + * the kernel capability is. It could be larger than the input buffer. > + */ > + cmd->data_len = data_len; > + > + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); > +out: out_free: > + kfree(data); > +err_put: out_put: (since this is also used in the success path) > + * To capture an iommu type specific hardware information data, > @data_uptr and > + * its length @data_len must be provided. Trailing bytes will be zeroed if the > + * user buffer is larger than the data that kernel has. Otherwise, kernel only > + * fills the buffer using the given length in @data_len. If the ioctl succeeds, > + * @data_len will be updated to the length that kernel actually supports, > + * @out_data_type will be filled to decode the data filled in the buffer > + * pointed by @data_uptr. Input @data_len == zero is allowed, no > information > + * data will be filled to user, but user space could get the > iommu_hw_info_type > + * filled in @out_data_type and the iommu hardware information data > length > + * supported by kernel filled in @data_len. I'd just keep "Input @data_len == zero is allowed" and remove all the trailing words which just duplicate with the former context. Reviewed-by: Kevin Tian <kevin.tian@xxxxxxxxx>