RE: [PATCH] vfio: Support VFIO_NOIOMMU with iommufd

"Tian, Kevin" <kevin.tian@xxxxxxxxx> · Tue, 10 Jan 2023 06:20:33 +0000

> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> Sent: Monday, January 9, 2023 10:23 PM
>
>  /**
> - * iommufd_vfio_compat_ioas_id - Return the IOAS ID that vfio should use
> + * iommufd_vfio_compat_ioas_get_id - Ensure a comat IOAS exists

s/comat/compat/

> +/**
> + * iommufd_vfio_compat_ioas_create_id - Return the IOAS ID that vfio

'ID' is not returned in this case.

and it's slightly clearer to remove the trailing '_id' in the function name.

> @@ -235,6 +253,9 @@ static int iommufd_vfio_check_extension(struct
> iommufd_ctx *ictx,
>  	case VFIO_UNMAP_ALL:
>  		return 1;
> 
> +	case VFIO_NOIOMMU_IOMMU:
> +	return IS_ENABLED(CONFIG_VFIO_NOIOMMU);
> +

also check vfio_noiommu?

> @@ -264,6 +285,17 @@ static int iommufd_vfio_set_iommu(struct
> iommufd_ctx *ictx, unsigned long type)
>  	struct iommufd_ioas *ioas = NULL;
>  	int rc = 0;
>
> 
> +	/*
> +	 * Emulation for NOIMMU is imperfect in that VFIO blocks almost all

s/NOIMMU/NOIOMMU/

> +	 * other ioctls. We let them keep working but they mostly fail since no
> +	 * IOAS should exist.
> +	 */
> +	if (IS_ENABLED(CONFIG_VFIO_NOIOMMU) && type ==
> VFIO_NOIOMMU_IOMMU) {
> +		if (!capable(CAP_SYS_RAWIO))
> +			return -EPERM;
> +		return 0;
> +	}
> +

Another subtle difference. The container path has below check which applies
to noiommu:

	/*
	 * The container is designed to be an unprivileged interface while
	 * the group can be assigned to specific users.  Therefore, only by
	 * adding a group to a container does the user get the privilege of
	 * enabling the iommu, which may allocate finite resources.  There
	 * is no unset_iommu, but by removing all the groups from a container,
	 * the container is deprivileged and returns to an unset state.
	 */
	if (list_empty(&container->group_list) || container->iommu_driver) {
		up_write(&container->group_lock);
		return -EINVAL;
	}

> @@ -133,12 +133,13 @@ static int vfio_group_ioctl_set_container(struct
> vfio_group *group,
> 
>  	iommufd = iommufd_ctx_from_file(f.file);
>  	if (!IS_ERR(iommufd)) {
> -		u32 ioas_id;
> -
> -		ret = iommufd_vfio_compat_ioas_id(iommufd, &ioas_id);
> -		if (ret) {
> -			iommufd_ctx_put(group->iommufd);
> -			goto out_unlock;
> +		if (!IS_ENABLED(CONFIG_VFIO_NO_IOMMU) ||
> +		    group->type != VFIO_NO_IOMMU) {
> +			ret =
> iommufd_vfio_compat_ioas_create_id(iommufd);
> +			if (ret) {
> +				iommufd_ctx_put(group->iommufd);
> +				goto out_unlock;
> +			}

This doesn't prevent userspace from mixing noiommu and the check
in vfio_iommufd_bind() is partial which only ensures no compat ioas
when binding a noiommu device. It's still possible to bind a VFIO_IOMMU
type device and create the compat ioas afterwards.

somehow we need a sticky bit similar to container->noiommu.