On Thu, 2018-08-16 at 10:58 +0300, Konstantin Khlebnikov wrote: > On 16.08.2018 00:52, Benjamin Herrenschmidt wrote: > > On Wed, 2018-08-15 at 13:50 -0500, Bjorn Helgaas wrote: > > > Yes, this is definitely broken. Some folks have tried to fix it in > > > the past, but it hasn't quite happened yet. We actually merged one > > > patch, 40f11adc7cd9 ("PCI: Avoid race while enabling upstream > > > bridges"), but had to revert it after we found issues: > > > > > > https://lkml.kernel.org/r/1501858648-22228-1-git-send-email-srinath.mannam@xxxxxxxxxxxx > > > https://lkml.kernel.org/r/20170915072352.10453.31977.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > > Ok so I had a look at this previous patch and it adds yet anothe use of > > some global mutex to protect part of the operation which makes me > > cringe a bit, we have too many of these. > > > > What do you think of the one I sent yesterday ? (I can't find it in the > > archives yet) > > > > [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races > > > > The patch itself needs splitting etc... but the basic idea is to move away > > from those global mutexes in a number of places and have one in the pci_dev > > struct itself to protect its state. > > > > I would also like to use this rather than the bitmap atomics for is_added > > etc... (Hari's fix) in the long run. Atomics aren't significantly cheaper > > and imho makes thing even messier. > > > > Jens, Konstantin, any chance you can test if the above also breaks iwlwifi > > (I don't see why it would but ...) > > > > I suppose original race was discovered between enabling bridge and device as described here > > https://lore.kernel.org/lkml/150547971091.977464.16294045866179907260.stgit@buzz/T/#u > > I barely can remember what I ever posted this, so I couldn't reproduce for sure. Ok. Well, my patch fixes it for my repro-case at least and seems to not break anyhting on my thinkpad so ... Bjorn, are you ok with the approach ? If yes, I'll start breaking up that patch into a few smaller bits in case something goes wrong and we want to bisect (such as the changes I did to tracking is_busmaster etc...) Cheers, ben.