On Thu, Nov 16, 2023 at 01:05:58PM -0400, Jason Gunthorpe wrote: > On Wed, Nov 15, 2023 at 06:25:44PM +0000, Robin Murphy wrote: > > It turns out there are more subtle races beyond just the main part of > > __iommu_probe_device() itself running in parallel - the dev_iommu_free() > > on the way out of an unsuccessful probe can still manage to trip up > > concurrent accesses to a device's fwspec. Thus, extend the scope of > > iommu_probe_device_lock() to also serialise fwspec creation and initial > > retrieval. > > > > Reported-by: Zhenhua Huang <quic_zhenhuah@xxxxxxxxxxx> > > Link: https://lore.kernel.org/linux-iommu/e2e20e1c-6450-4ac5-9804-b0000acdf7de@xxxxxxxxxxx/ > > Fixes: 01657bc14a39 ("iommu: Avoid races around device probe") > > Signed-off-by: Robin Murphy <robin.murphy@xxxxxxx> > > --- > > > > This is my idea of a viable fix, since it does not need a 700-line > > diffstat to make the code do what it was already *trying* to do anyway. > > This stuff should fundamentally not be hanging off driver probe in the > > first place, so I'd rather get on with removing the underlying > > brokenness than waste time and effort polishing it any further. > > I'm fine with this as some hacky backport, but I don't want to see > this cross-layer leakage left in the next merge window. > > ie we should still do my other series on top of and reverting this. > > I've poked at moving parts of it under probe and I think we can do > substantial amounts in about two more series and a tidy a bunch of > other stuff too. I agree, it's messy and acpi should not need this, BUT at the moment, I can't see any other way to resolve this simply. So here's a begrudged ack: Acked-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> and hopefully the larger series should resolve this correctly? Can that be rebased on top of this? Also, cc: stable on this for whomever applies it? thanks, greg k-h