Re: Initial APCI root bus discovery vs. rescan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 02, 2015 at 01:54:18PM -0400, Prarit Bhargava wrote:
> On 05/26/2015 12:07 PM, Bjorn Helgaas wrote:

> ...
> [    1.546925] pci_bus 0000:00: root bus resource [bus 00-3e]
> [    1.552397] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
> [    1.559165] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
> [    1.565934] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
> [    1.573398] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window]
> [    1.580861] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window]
> [    1.588322] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window]
> [    1.595784] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window]
> [    1.603246] pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff window]
> [    1.610707] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window]
> [    1.618170] pci_bus 0000:00: root bus resource [mem 0xb0000000-0xfeafffff window]

> [    1.637470] pci 0000:00:16.3: [8086:1e3d] type 00 class 0x070002
> [    1.637486] pci 0000:00:16.3: reg 0x10: [io  0x70a0-0x70a7]
> [    1.637495] pci 0000:00:16.3: reg 0x14: [mem 0xb1580000-0xb1580fff]

> [    2.961417] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.988543] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
> [    3.016847] serial8250: ttyS2 at I/O 0x3e8 (irq = 4, base_baud = 115200) is a
> 16550A
> [    3.045264] 0000:00:16.3: ttyS1 at I/O 0x70a0 (irq = 19, base_baud = 115200)
> is a 16550A

> > In this scenario, I assume the serial port device remains powered all the
> > time, even while it is logically removed from the system, so when we
> > re-enumerate and find the device, I would think its BARs would still
> > contain whatever they had before, and since they are still valid, we should
> > still use them.
> 
> Nope.  The device should go down as ttyS1 is not active.

I'm talking about the 00:16.3 PCI device.  I doubt there's anything that
would remove power from it when you do the "echo 1 > remove".  Of course,
Linux will forget about it, and 00:16.3 shouldn't show up in lspci output,
but from the device's point of view, nothing has really changed.  When we
rescan, we should find it just as we left it (it's possible we'd clear bits
in the PCI_COMMAND register or something, but I'm not sure we even do
that, and I'm pretty sure we don't clear out the BARs).

> > So I think my expectation is the same as yours, and I don't know why it
> > doesn't work that way.  I assume the device actually *works* with the new
> > resources, so it's not really broken in that sense, but it does bother me
> > if we're changing something when we don't need to change it.
> 
> Yep ... I think it's broken.  Here's what I'm doing to down then rescan
> the device.
> 
> [root@intel-chiefriver-04 ~]# cd /sys/devices/pci0000\:00/0000\:00\:16.3
> [root@intel-chiefriver-04 0000:00:16.3]# echo 1 > remove
> [root@intel-chiefriver-04 0000:00:16.3]# lspci | grep 16.3
> [root@intel-chiefriver-04 0000:00:16.3]# cd ../pci_bus/0000\:00/
> [root@intel-chiefriver-04 0000:00]# echo 1 > rescan

The /sys/devices/pci0000:00/0000:00:16.3/ directory should disappear when
you remove the device.  In this case you were *inside* the directory when
you did the remove, so your shell is holding a reference to it.  But if you
do this:

    # cd /sys/devices/pci0000:00
    # echo 1 > 0000:00:16.3/remove
    # ls

you should not see the 0000:00:16.3 directory any more.

> and the console contains
> 
> [  353.212980] pci 0000:00:16.3: [8086:1e3d] type 00 class 0x070002
> [  353.231163] pci 0000:00:16.3: BAR 1: assigned [mem 0xb1520000-0xb1520fff]
> [  353.237937] pci 0000:00:16.3: BAR 0: assigned [io  0x1018-0x101f]

There should be some more output here.  I usually boot with
"ignore_loglevel" to make sure I see everything.

The first line looks like it's from pci_setup_device().  The "BAR x:
assigned" lines look like they're from pci_assign_resource().  But there
should be "reg 0x%x: %pR\n" lines from __pci_read_base() in the middle.
Those would show us what we actually got from the device BARs.

Can you check /proc/iomem and /proc/ioports before you remove it, after you
remove it, and a third time after you do the rescan (I guess it's hung
after the rescan, so you probably can't do that)?  I wonder if we forget to
remove the original resources when we remove it, and then we reassign them
after the rescan because we think they're still in use.

> and the system is hung.  The addresses are clearly different here and I'll
> debug further.  

Even if we change the 00:16.3 BARs, I wouldn't think the system would
hang unless we actually try to *use* those new addresses.

And ... I guess the next thing we *should* see is the serial driver
claiming this device again, and it would read some of the serial port
registers.  I bet if you added instrumentation to pciserial_init_one(),
you'd see output until you get into pciserial_init_ports().

The new addresses don't *look* like they should conflict with anything, but
maybe they do.  I think we just need to figure out why we can't use the
original addresses, and my first guess is that we don't release those
resources correctly.

> The one odd thing is that the ttyS1 device is not removed from
> /sys/ during the remove.  It's almost as if it is left around as a place holder
> for 16.3 if it should be reinserted, and I have a feeling that may lead to the
> system hang.

I'm not exactly sure what you're saying here.  Are you saying there's a
/sys/.../ttyS1 that still exists after the remove?  Or do you mean that
/sys/.../0000:00:16.3 still exists after the remove?  

If the former, maybe there's something wrong with the remove path in the
serial driver.  If the latter, that seems wrong unless there something like
a shell holding a reference to the directory.

Bjorn
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux