Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.08.2018 00:52, Benjamin Herrenschmidt wrote:
On Wed, 2018-08-15 at 13:50 -0500, Bjorn Helgaas wrote:
Yes, this is definitely broken.  Some folks have tried to fix it in
the past, but it hasn't quite happened yet.  We actually merged one
patch, 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
bridges"), but had to revert it after we found issues:

https://lkml.kernel.org/r/1501858648-22228-1-git-send-email-srinath.mannam@xxxxxxxxxxxx
https://lkml.kernel.org/r/20170915072352.10453.31977.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Ok so I had a look at this previous patch and it adds yet anothe use of
some global mutex to protect part of the operation which makes me
cringe a bit, we have too many of these.

What do you think of the one I sent yesterday ? (I can't find it in the
archives yet)

[RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races

The patch itself needs splitting etc... but the basic idea is to move away
from those global mutexes in a number of places and have one in the pci_dev
struct itself to protect its state.

I would also like to use this rather than the bitmap atomics for is_added
etc... (Hari's fix) in the long run. Atomics aren't significantly cheaper
and imho makes thing even messier.

Jens, Konstantin, any chance you can test if the above also breaks iwlwifi
(I don't see why it would but ...)


I suppose original race was discovered between enabling bridge and device as described here

https://lore.kernel.org/lkml/150547971091.977464.16294045866179907260.stgit@buzz/T/#u

I barely can remember what I ever posted this, so I couldn't reproduce for sure.

Cheers,
Ben.





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux