> From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Saturday, July 29, 2023 1:45 AM > > On Mon, Jul 24, 2023 at 04:03:56AM -0700, Yi Liu wrote: > > > +/** > > + * struct iommu_resv_iova_ranges - ioctl(IOMMU_RESV_IOVA_RANGES) > > + * @size: sizeof(struct iommu_resv_iova_ranges) > > + * @dev_id: device to read resv iova ranges for > > + * @num_iovas: Input/Output total number of resv ranges for the device > > + * @__reserved: Must be 0 > > + * @resv_iovas: Pointer to the output array of struct > iommu_resv_iova_range > > + * > > + * Query a device for ranges of reserved IOVAs. num_iovas will be set to > the > > + * total number of iovas and the resv_iovas[] will be filled in as space > > + * permits. > > + * > > + * On input num_iovas is the length of the resv_iovas array. On output it is > > + * the total number of iovas filled in. The ioctl will return -EMSGSIZE and > > + * set num_iovas to the required value if num_iovas is too small. In this > > + * case the caller should allocate a larger output array and re-issue the > > + * ioctl. > > + * > > + * Under nested translation, userspace should query the reserved IOVAs > for a > > + * given device, and report it to the stage-1 I/O page table owner to > exclude > > + * the reserved IOVAs. The reserved IOVAs can also be used to figure out > the > > + * allowed IOVA ranges for the IOAS that the device is attached to. For > detail > > + * see ioctl IOMMU_IOAS_IOVA_RANGES. > > I'm not sure I like this, the other APIs here work with the *allowed* > IOVAs, which is the inverse of this one which works with the > *disallowed* IOVAs. > > It means we can't take the output of this API and feed it into > IOMMUFD_CMD_IOAS_ALLOW_IOVAS.. Though I suppose qemu isn't going > to do > this anyhow. > > On the other hand, it is kind of hard to intersect an allowed list of > multiple idevs into a single master list. > > As it is, userspace will have to aggregate the list, sort it, merge > adjacent overlapping reserved ranges then invert the list to get an > allowed list. This is not entirely simple.. > > Did you already write an algorithm to do this in qemu someplace? Qemu is optional to aggregate it for S2 given IOMMU_IOAS_IOVA_RANGES is still being used. If the only purpose of using this new cmd is to report per-device reserved ranges to the guest then aggregation is not required. Arguably IOMMU_IOAS_IOVA_RANGES becomes redundant with this new cmd. But it's already there and as you said it's actually more convenient to be used if the user doesn't care about per-device reserved ranges... > > Anyhow, this should be split out from this series. It seems simple > enough to merge it now if someone can confirm what qemu needs. > > Jason