On Mon, 17 Apr 2023 16:31:56 -0300 Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > On Mon, Apr 17, 2023 at 01:01:40PM -0600, Alex Williamson wrote: > > Yes, it's not trivial, but Jason is now proposing that we consider > > mixing groups, cdevs, and multiple iommufd_ctxs as invalid. I think > > this means that regardless of which device calls INFO, there's only one > > answer (assuming same set of devices opened, all cdev, all within same > > iommufd_ctx). Based on what I explained about my understanding of INFO2 > > and Jason agreed to, I think the output would be: > > > > flags: NOT_RESETABLE | DEV_ID > > { > > { valid devA-id, devA-BDF }, > > { valid devC-id, devC-BDF }, > > { valid devD-id, devD-BDF }, > > { invalid dev-id, devE-BDF }, > > } > > > > Here devB gets dropped because the kernel understands that devB is > > unopened, affected, and owned. It's therefore not a blocker for > > hot-reset. > > I don't think we want to drop anything because it makes the API > ill suited for the debugging purpose. > > devb should be returned with an invalid dev_id if I understand your > example. Maybe it should return with -1 as the dev_id instead of 0, to > make the debugging a bit better. > > Userspace should look at only NOT_RESETTABLE to determine if it > proceeds or not, and it should use the valid dev_id list to iterate > over the devices it has open to do the config stuff. If an affected device is owned, not opened, and not interfering with the reset, what is it adding to the API to report it for debugging purposes? I'm afraid this leads into expanding "invalid dev-id" into an errno or bitmap of error conditions that the user needs to parse. > > OTOH, devE is unopened, affected, and un-owned, and we > > previously agreed against the opportunistic un-opened/un-owned loophole. > > NOT_RESETABLE should be returned in this case, yes. > > If we want to enable userspace to use the loophole it should be an > additional flag. RESETABLE_FOR_NOW or something Ugh, please no. It's always a volatile result, but a volatile result that relies on device state outside the scope or control of the user is not even worthwhile imo. Thanks, Alex