> From: Alex Williamson <alex.williamson@xxxxxxxxxx> > Sent: Friday, April 7, 2023 1:54 AM > > On Thu, 6 Apr 2023 10:02:10 +0000 > "Liu, Yi L" <yi.l.liu@xxxxxxxxx> wrote: > > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > Sent: Thursday, April 6, 2023 7:23 AM > > > > > > On Wed, Apr 05, 2023 at 01:49:45PM -0600, Alex Williamson wrote: > > > > > > > > > QEMU can make a policy decision today because the kernel provides a > > > > > > sufficiently reliable interface, ie. based on the set of owned groups, a > > > > > > hot-reset is all but guaranteed to work. > > > > > > > > > > And we don't change that with cdev. If qemu wants to make the policy > > > > > decision it keeps using the exact same _INFO interface to make that > > > > > decision same it has always made. > > > > > > > > > > We weaken the actual reset action to only consider the security side. > > > > > > > > > > Applications that want this exclusive reset group policy simply must > > > > > check it on their own. It is a reasonable API design. > > > > > > > > I disagree, as I've argued before, the info ioctl becomes so weak and > > > > effectively arbitrary from a user perspective at being able to predict > > > > whether the hot-reset ioctl works that it becomes useless, diminishing > > > > the entire hot-reset info/execute API. > > > > > > reset should be strictly more permissive than INFO. If INFO predicts > > > reset is permitted then reset should succeed. > > > > > > We don't change INFO so it cannot "becomes so weak" ?? > > > > > > We don't care about the cases where INFO says it will not succeed but > > > reset does (temporarily) succeed. > > > > > > I don't get what argument you are trying to make or what you think is > > > diminished.. > > > > > > Again, userspace calls INFO, if info says yes then reset *always > > > works*, exactly just like today. > > > > > > Userspace will call reset with a 0 length FD list and it uses a > > > security only check that is strictly more permissive than what > > > get_info will return. So the new check is simple in the kernel and > > > always works in the cases we need it to work. > > > > > > What is getting things into trouble is insisting that RESET have > > > additional restrictions beyond the minimum checks required for > > > security. > > > > > > > > I don't view it as a loophole, it is flexability to use the API in a > > > > > way that is different from what qemu wants - eg an app like dpdk may > > > > > be willing to tolerate a reset group that becomes unavailable after > > > > > startup. Who knows, why should we force this in the kernel? > > > > > > > > Because look at all the problems it's causing to try to introduce these > > > > loopholes without also introducing subtle bugs. > > > > > > These problems are coming from tring to do this integrated version, > > > not from my approach! > > > > > > AFAICT there was nothing wrong with my original plan of using the > > > empty fd list for reset. What Yi has here is some mashup of what you > > > and I both suggested. > > > > Hi Alex, Jason, > > > > could be this reason. So let me try to gather the changes of this series > > does and the impact as far as I know. > > > > 1) only check the ownership of opened devices in the dev_set > > in HOT_RESET ioctl. > > - Impact: it changes the relationship between _INFO and HOT_RESET. > > As " Each group must have IOMMU protection established for the > > ioctl to succeed." in [1], existing design actually means userspace > > should own all the affected groups before heading to do HOT_RESET. > > With the change here, the user does not need to ensure all affected > > groups are opened and it can do hot-reset successfully as long as the > > devices in the affected group are just un-opened and can be reset. > > > > [1] https://patchwork.kernel.org/project/linux- > pci/patch/20130814200845.21923.64284.stgit@xxxxxxxxxx/ > > Where whether a device is opened is subject to change outside of the > user's control. This essentially allows the user to perform hot-resets > of devices outside of their ownership so long as the device is not > used elsewhere, versus the current requirement that the user own all the > affected groups, which implies device ownership. It's not been > justified why this feature needs to exist, imo. > > > 2) Allow passing zero-length fd array to do hot reset > > - Impact: this uses the iommufd as ownership check in the kernel side. > > It is only supposed to be used by the users that open cdev instead of > > users that open group. The drawback is that it cannot cover the noiommu > > devices as noiommu does not use iommufd at all. But it works well for > > most cases. > > The "only supposed to be used" is problematic here, we're extending all > the interfaces to transparently accept group and device fds, but here > we need to make a distinction because the ioctl needs to perform one > way for groups and another way for devices, which it currently doesn't > do. As above, I've not seen sufficient justification for this other > than references to reducing complexity, but the only userspace expected > to make use of this interface already has equivalent complexity. > > > 3) Allow hot reset be successful when the dev_set is singleton > > - Impact: this makes sense but it seems to mess up the boundary between > > the group path and cdev path w.r.t. the usage of zero-length fd approach. > > The group path can succeed to do hot reset even if it is passing an empty > > fd array if the dev_set happens to be singleton. > > Again, what is the justification for requiring this, it seems to be > only a hack towards no-iommu support with cdev, which we can achieve by > other means. Why have we not needed this in the group model? It > introduces subtle loopholes, so while maybe we could, I don't see why we > should, therefore I cannot agree with "this makes sense". > > > 4) Allow passing device fd to do hot reset > > - Impact: this is a new way for hot reset. should have no impact. > > > > 5) Extend the _INFO to report devid > > - Impact: this changes the way user to decode the info reported back. > > devid and groupid are returned per the way the queried device is opened. > > Since it was suggested to support the scenario in which some devices > > are opened via cdev while some devices are opened via group. This makes > > us to return invalid_devid for the device that is opened via group if > > it is affected by the hot reset of a device that is opened via cdev. > > > > This was proposed to support the future device fd passing usage which is > > only available in cdev path. > > I think this is fundamentally flawed because of the scope of the > dev-id. We can only provide dev-ids for devices which belong to the > same iommufd of the calling device, thus there are multiple instances > where no dev-id can be provided. The group-id and bdf are static > properties of the devices, regardless of their ownership. The bdf > provides the specific device level association while the group-id > indicates implied, static ownership. > > > To me the major confusion is from 1) and 3). 1) changes the meaning of > > _INFO and HOT_RESET, while 3) messes up the boundary. > > As above, I think 2) is also an issue. > > > Here is my thought: > > > > For 1), it was proposed due to below reason[2]. We'd like to make a scenario > > that works in the group path be workable in cdev path as well. But IMHO, we > > may just accept that cdev path cannot work for such scenario to avoid sublte > > change to uapi. Otherwise, we need to have another HOT_RESET ioctl or a > > hint in HOT_RESET ioctl to tell the kernel whether relaxed ownership check > > is expected. Maybe this is awkward. But if we want to keep it, we'd do it > > with the awareness by user. > > > > [2] https://lore.kernel.org/kvm/Y%2FdobS6gdSkxnPH7@xxxxxxxxxx/ > > The group association is that relaxed ownership test. Yes, there are > corner cases where we have a dual function card with separate IOMMU > groups, where a user owning function 0 could do a bus reset because > function 1 is temporarily unused, but so what, what good is that, have > we ever had an issue raised because of this? The user can't rely on > the unopened state of the other function. It's an entirely > opportunistic optimization. > > The much more typical scenario is that a multi-function device does not > provide isolation, all the functions are in the same group and because > of the association of the group the user has implied ownership of the > other devices for the purpose of a reset. > > > For 3), it was proposed when discussing the hot reset for noiommu[3]. But > > it does not make hot reset always workable for noiommu in cdev, just in > > case dev_set is singleton. So it is more of a general optimization that can > > make the kernel skip the ownership check. But to make use of it, we may > > need to test it before sanitizing the group fds from user or the iommufd > > check. Maybe the dev_set singleton test in this series is not well placed. > > If so, I can further modify it. > > > > [3] https://lore.kernel.org/kvm/ZACX+Np%2FIY7ygqL5@xxxxxxxxxx/ > > As above, this seems to be some optimization related to no-iommu for > cdev because we don't have an iommufd association for the device in > no-iommu mode. Note however that the current group interface doesn't > care about the IOMMU context of the devices. We only need proof that > the user owns the affected groups. So why are we bringing iommufd > context anywhere into this interface, here or the null-array interface? > > It seems like the minor difference with cdev is that a) we're passing > device fds rather than group fds, and b) those device fds need to be > validated as having device access to complete the proof of ownership > relative to the group. Otherwise we add capabilities to > DEVICE_GET_INFO to support the device fd passing model where the user > doesn't know the device group or bdf and allow the reset ioctl itself > to accept device fds (extracting the group relationship for those which > the user has configured for access). Thanks, so your suggestion is to drop 1) 2) 3) and 5), keep 4) and add new bdf/group capability to DEVICE_GET_INFO to retrieve group_id and bdf. In this way, the existing _INFO ioctl can be reused without any change. is it? Regards, Yi Liu