Re: PCIe enable device races (Was: [PATCH v3] PCI: Data corruption happening due to race condition)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/16/18 2:02 AM, Benjamin Herrenschmidt wrote:
> On Thu, 2018-08-16 at 10:58 +0300, Konstantin Khlebnikov wrote:
>> On 16.08.2018 00:52, Benjamin Herrenschmidt wrote:
>>> On Wed, 2018-08-15 at 13:50 -0500, Bjorn Helgaas wrote:
>>>> Yes, this is definitely broken.  Some folks have tried to fix it in
>>>> the past, but it hasn't quite happened yet.  We actually merged one
>>>> patch, 40f11adc7cd9 ("PCI: Avoid race while enabling upstream
>>>> bridges"), but had to revert it after we found issues:
>>>>
>>>> https://lkml.kernel.org/r/1501858648-22228-1-git-send-email-srinath.mannam@xxxxxxxxxxxx
>>>> https://lkml.kernel.org/r/20170915072352.10453.31977.stgit@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
>>>
>>> Ok so I had a look at this previous patch and it adds yet anothe use of
>>> some global mutex to protect part of the operation which makes me
>>> cringe a bit, we have too many of these.
>>>
>>> What do you think of the one I sent yesterday ? (I can't find it in the
>>> archives yet)
>>>
>>> [RFC PATCH] pci: Proof of concept at fixing pci_enable_device/bridge races
>>>
>>> The patch itself needs splitting etc... but the basic idea is to move away
>>> from those global mutexes in a number of places and have one in the pci_dev
>>> struct itself to protect its state.
>>>
>>> I would also like to use this rather than the bitmap atomics for is_added
>>> etc... (Hari's fix) in the long run. Atomics aren't significantly cheaper
>>> and imho makes thing even messier.
>>>
>>> Jens, Konstantin, any chance you can test if the above also breaks iwlwifi
>>> (I don't see why it would but ...)
>>>
>>
>> I suppose original race was discovered between enabling bridge and device as described here
>>
>> https://lore.kernel.org/lkml/150547971091.977464.16294045866179907260.stgit@buzz/T/#u
>>
>> I barely can remember what I ever posted this, so I couldn't reproduce for sure.
> 
> Ok. Well, my patch fixes it for my repro-case at least and seems to not
> break anyhting on my thinkpad so ...
> 
> Bjorn, are you ok with the approach ? If yes, I'll start breaking up
> that patch into a few smaller bits in case something goes wrong and we
> want to bisect (such as the changes I did to tracking is_busmaster
> etc...)

I can try it too, but I was never CC'ed on the actual patch.

-- 
Jens Axboe




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux