Hi Jason,
On 2022/4/22 22:58, Jason Gunthorpe wrote:
On Thu, Apr 14, 2022 at 03:47:07AM -0700, Yi Liu wrote:
+static int vfio_get_devicefd(const char *sysfs_path, Error **errp)
+{
+ long int vfio_id = -1, ret = -ENOTTY;
+ char *path, *tmp = NULL;
+ DIR *dir;
+ struct dirent *dent;
+ struct stat st;
+ gchar *contents;
+ gsize length;
+ int major, minor;
+ dev_t vfio_devt;
+
+ path = g_strdup_printf("%s/vfio-device", sysfs_path);
+ if (stat(path, &st) < 0) {
+ error_setg_errno(errp, errno, "no such host device");
+ goto out;
+ }
+
+ dir = opendir(path);
+ if (!dir) {
+ error_setg_errno(errp, errno, "couldn't open dirrectory %s", path);
+ goto out;
+ }
+
+ while ((dent = readdir(dir))) {
+ const char *end_name;
+
+ if (!strncmp(dent->d_name, "vfio", 4)) {
+ ret = qemu_strtol(dent->d_name + 4, &end_name, 10, &vfio_id);
+ if (ret) {
+ error_setg(errp, "suspicious vfio* file in %s", path);
+ goto out;
+ }
Userspace shouldn't explode if there are different files here down the
road. Just search for the first match of vfio\d+ and there is no need
to parse out the vfio_id from the string. Only fail if no match is
found.
+ tmp = g_strdup_printf("/dev/vfio/devices/vfio%ld", vfio_id);
+ if (stat(tmp, &st) < 0) {
+ error_setg_errno(errp, errno, "no such vfio device");
+ goto out;
+ }
And simply pass the string directly here, no need to parse out
vfio_id.
got above suggestion.
I also suggest falling back to using "/dev/char/%u:%u" if the above
does not exist which prevents "vfio/devices/vfio" from turning into
ABI.
do you mean there is no matched file under /dev/vfio/devices/? Is this
possible?
It would be a good idea to make a general open_cdev function that does
all this work once the sysfs is found and cdev read out of it, all the
other vfio places can use it too.
hmmm, it's good to have a general open_cdev() function. But I guess this
is the only place in VFIO to open the device cdev. Do you mean the vdpa
stuffes?
+static int iommufd_attach_device(VFIODevice *vbasedev, AddressSpace *as,
+ Error **errp)
+{
+ VFIOContainer *bcontainer;
+ VFIOIOMMUFDContainer *container;
+ VFIOAddressSpace *space;
+ struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+ int ret, devfd, iommufd;
+ uint32_t ioas_id;
+ Error *err = NULL;
+
+ devfd = vfio_get_devicefd(vbasedev->sysfsdev, errp);
+ if (devfd < 0) {
+ return devfd;
+ }
+ vbasedev->fd = devfd;
+
+ space = vfio_get_address_space(as);
+
+ /* try to attach to an existing container in this space */
+ QLIST_FOREACH(bcontainer, &space->containers, next) {
+ if (!object_dynamic_cast(OBJECT(bcontainer),
+ TYPE_VFIO_IOMMUFD_CONTAINER)) {
+ continue;
+ }
+ container = container_of(bcontainer, VFIOIOMMUFDContainer, obj);
+ if (vfio_device_attach_container(vbasedev, container, &err)) {
+ const char *msg = error_get_pretty(err);
+
+ trace_vfio_iommufd_fail_attach_existing_container(msg);
+ error_free(err);
+ err = NULL;
+ } else {
+ ret = vfio_ram_block_discard_disable(true);
+ if (ret) {
+ vfio_device_detach_container(vbasedev, container, &err);
+ error_propagate(errp, err);
+ vfio_put_address_space(space);
+ close(vbasedev->fd);
+ error_prepend(errp,
+ "Cannot set discarding of RAM broken (%d)", ret);
+ return ret;
+ }
+ goto out;
+ }
+ }
?? this logic shouldn't be necessary, a single ioas always supports
all devices, userspace should never need to juggle multiple ioas's
unless it wants to have different address maps.
legacy vfio container needs to allocate multiple containers in some cases.
Say a device is attached to a container and some iova were mapped on this
container. When trying to attach another device to this container, it will
be failed in case of conflicts between the mapped DMA mappings and the
reserved iovas of the another device. For such case, legacy vfio chooses to
create a new container and attach the group to this new container. Hotlplug
is a typical case of such scenario.
I think current iommufd also needs such choice. The reserved_iova and
mapped iova area are tracked in io_pagetable, and this structure is
per-IOAS. So if there is conflict between mapped iova areas of an IOAS and
the reserved_iova of a device that is going to be attached to IOAS, the
attachment would be failed. To be working, QEMU needs to create another
IOAS and attach the device to new IOAS as well.
struct io_pagetable {
struct rw_semaphore domains_rwsem;
struct xarray domains;
unsigned int next_domain_id;
struct rw_semaphore iova_rwsem;
struct rb_root_cached area_itree;
struct rb_root_cached reserved_iova_itree;
unsigned long iova_alignment;
};
struct iommufd_ioas {
struct iommufd_object obj;
struct io_pagetable iopt;
struct mutex mutex;
struct list_head auto_domains;
};
Something I would like to see confirmed here in qemu is that qemu can
track the hw pagetable id for each device it binds because we will
need that later to do dirty tracking and other things.
we have tracked the hwpt_id. :-)
+ /*
+ * TODO: for now iommufd BE is on par with vfio iommu type1, so it's
+ * fine to add the whole range as window. For SPAPR, below code
+ * should be updated.
+ */
+ vfio_host_win_add(bcontainer, 0, (hwaddr)-1, 4096);
? Not sure what this is, but I don't expect any changes for SPAPR
someday IOMMU_IOAS_IOVA_RANGES should be able to accurately report its
configuration.
I don't see IOMMU_IOAS_IOVA_RANGES called at all, that seems like a
problem..
(and note that IOVA_RANGES changes with every device attached to the IOAS)
Jason
--
Regards,
Yi Liu