Re: Possible PCI Regression Linux 5.3-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 04, 2019 at 04:47:36PM +0800, NicholasJohnson wrote:
> On Thu, Jul 25, 2019 at 09:50:48AM -0600, Logan Gunthorpe wrote:
> > 
> > 
> > On 2019-07-25 7:18 a.m., Nicholas Johnson wrote:
> > > On Wed, Jul 24, 2019 at 08:38:14AM -0500, Bjorn Helgaas wrote:
> > >> On Wed, Jul 24, 2019 at 12:54:00PM +0000, Nicholas Johnson wrote:
> > >>> Hi all,
> > >>>
> > >>> I was just rebasing my patches for linux 5.3-rc1 and noticed a possible 
> > >>> regression that shows on both of my machines. It is also reproducible 
> > >>> with the unmodified Ubuntu mainline kernel, downloadable at [1].
> > >>>
> > >>> Running the lspci command takes 1-3 seconds with 5.3-rc1 (rather than an 
> > >>> imperceivable amount of time). Booting with pci.dyndbg does not reveal 
> > >>> why.
> > >>>
> > >>> $ uname -r
> > >>> 5.3.0-050300rc1-generic
> > >>> $ time lspci -vt 1>/dev/null
> > >>>
> > >>> real	0m2.321s
> > >>> user	0m0.026s
> > >>> sys	0m0.000s
> > >>>
> > >>> If none of you are aware of this or what is causing it, I will submit a 
> > >>> bug report to Bugzilla.
> > >>
> > >> I wasn't aware of this; thanks for reporting it!  I wasn't able to
> > >> reproduce this in qemu.  Can you play with "strace -r lspci -vt" and
> > >> the like?  Maybe try "lspci -n" to see if it's related to looking up
> > >> the names?
> > > 
> > > For a second you had me doubting myself - it could have been a Ubuntu 
> > > thing. But no, I just reproduced it on Arch Linux, and double checked 
> > > that it was not doing it on 5.2. Also, the problem occurs even without 
> > > the PCI kernel parameters which I usually pass.
> > 
> > Ok, can you bisect to find the commit that causes this issue?
> 
> I have done a partial bisect and then found the culprit commit by visual 
> inspection. I would have done the full bisect, but I am using highly
> underpowered i7-7700K so each round requires 20-30 minutes of compiling.
> 
> Reversing the following commit solves the issue:
> 
> commit c2bf1fc212f7e6f25ace1af8f0b3ac061ea48ba5
> Author: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
> PCI: Add missing link delays required by the PCIe spec
> 
> Mika, care to weigh in (assuming you are back from four weeks leave)? 
> Clearly this creates delays in "lspci -vt" in some Thunderbolt systems, 
> but not all - otherwise you would have caught it. You mentioned Ice Lake 
> in the commit log so perhaps it works fine on Ice Lake.
> 
> Thanks,
> Nicholas

I am re-posting to add more information. Here is my Thunderbolt, with 
the NHI under bus 05 ("lspci -t"):

           +-1c.4-[03-6d]----00.0-[04-6d]--+-00.0-[05]----00.0
           |                               +-01.0-[06-38]----00.0-[07-38]----01.0-[08]----00.0
           |                               +-02.0-[39]----00.0
           |                               \-04.0-[3a-6d]--

$ time cat /sys/bus/pci/devices/0000\:05\:00.0/config | hexdump
0000000 8086 15d2 0406 0010 0002 0880 0000 0000
0000010 0000 ac00 0000 ac04 0000 0000 0000 0000
0000020 0000 0000 0000 0000 0000 0000 1028 08af
0000030 0000 0000 0080 0000 0000 0000 01ff 0000
0000040

real	0m1.132s
user	0m0.002s
sys	0m0.000s

But it is instant for all of the other busses - so it has something to 
do with the NHI in particular.

Regards,
Nicholas

> 
> > 
> > Thanks,
> > 
> > Logan



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux