On Tue, Oct 04, 2022 at 02:19:57PM -0600, Alex Williamson wrote: > > >> v2 > > >> - Rebase on the vfio struct device series and the container.c series > > >> - Drop patches 1 & 2, we need to have working error unwind, so another > > >> test is not a problem > > >> - Fold iommu_group_remove_device() into vfio_device_remove_group() so > > >> that it forms a strict pairing with the two allocation functions. > > >> - Drop the iommu patch from the series, it needs more work and discussion > > >> v1 https://lore.kernel.org/r/0-v1-ef00ffecea52+2cb-iommu_group_lifetime_jgg@xxxxxxxxxx > > >> > > >> This could probably use another quick sanity test due to all the rebasing, > > >> Alex if you are happy let's wait for Matthew. > > >> > > > > > > I have been re-running the same series of tests on this version (on top of vfio-next) and this still resolves the reported issue. Thanks Jason! > > > > Hmm, there's more going on with this patch besides the issues with -ap and -ccw. While it does indeed resolve the crashes I had been seeing, I just now noticed that I see monotonically increasing iommu group IDs (implying we are not calling iommu_group_release as much as we should be) when running the same testscase that would previously trigger the occasional crash (host device is powered off): Yeah, I noticed that when writing the other patch, NULLing the iommu_group quietly broke release. It should be fixed in the followup by moving the iommu_group_put > I need to break my next branch anyway to correct a Fixes: sha1, so let > me know if we should just drop this for now instead. Thanks, I suspect other following patches will conflict with dropping it, maybe better to just fix it. Jason