Re: Initial APCI root bus discovery vs. rescan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 06/02/2015 04:44 PM, Bjorn Helgaas wrote:
> On Tue, Jun 02, 2015 at 01:54:18PM -0400, Prarit Bhargava wrote:
>> On 05/26/2015 12:07 PM, Bjorn Helgaas wrote:
> 
>> ...
>> [    1.546925] pci_bus 0000:00: root bus resource [bus 00-3e]
>> [    1.552397] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
>> [    1.559165] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
>> [    1.565934] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
>> [    1.573398] pci_bus 0000:00: root bus resource [mem 0x000d0000-0x000d3fff window]
>> [    1.580861] pci_bus 0000:00: root bus resource [mem 0x000d4000-0x000d7fff window]
>> [    1.588322] pci_bus 0000:00: root bus resource [mem 0x000d8000-0x000dbfff window]
>> [    1.595784] pci_bus 0000:00: root bus resource [mem 0x000dc000-0x000dffff window]
>> [    1.603246] pci_bus 0000:00: root bus resource [mem 0x000e0000-0x000e3fff window]
>> [    1.610707] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window]
>> [    1.618170] pci_bus 0000:00: root bus resource [mem 0xb0000000-0xfeafffff window]
> 
>> [    1.637470] pci 0000:00:16.3: [8086:1e3d] type 00 class 0x070002
>> [    1.637486] pci 0000:00:16.3: reg 0x10: [io  0x70a0-0x70a7]
>> [    1.637495] pci 0000:00:16.3: reg 0x14: [mem 0xb1580000-0xb1580fff]
> 
>> [    2.961417] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [    2.988543] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
>> [    3.016847] serial8250: ttyS2 at I/O 0x3e8 (irq = 4, base_baud = 115200) is a
>> 16550A
>> [    3.045264] 0000:00:16.3: ttyS1 at I/O 0x70a0 (irq = 19, base_baud = 115200)
>> is a 16550A
> 
>>> In this scenario, I assume the serial port device remains powered all the
>>> time, even while it is logically removed from the system, so when we
>>> re-enumerate and find the device, I would think its BARs would still
>>> contain whatever they had before, and since they are still valid, we should
>>> still use them.
>>
>> Nope.  The device should go down as ttyS1 is not active.
> 
> I'm talking about the 00:16.3 PCI device.  I doubt there's anything that
> would remove power from it when you do the "echo 1 > remove".  Of course,
> Linux will forget about it, and 00:16.3 shouldn't show up in lspci output,
> but from the device's point of view, nothing has really changed.  When we
> rescan, we should find it just as we left it (it's possible we'd clear bits
> in the PCI_COMMAND register or something, but I'm not sure we even do
> that, and I'm pretty sure we don't clear out the BARs).
> 
>>> So I think my expectation is the same as yours, and I don't know why it
>>> doesn't work that way.  I assume the device actually *works* with the new
>>> resources, so it's not really broken in that sense, but it does bother me
>>> if we're changing something when we don't need to change it.
>>
>> Yep ... I think it's broken.  Here's what I'm doing to down then rescan
>> the device.
>>
>> [root@intel-chiefriver-04 ~]# cd /sys/devices/pci0000\:00/0000\:00\:16.3
>> [root@intel-chiefriver-04 0000:00:16.3]# echo 1 > remove
>> [root@intel-chiefriver-04 0000:00:16.3]# lspci | grep 16.3
>> [root@intel-chiefriver-04 0000:00:16.3]# cd ../pci_bus/0000\:00/
>> [root@intel-chiefriver-04 0000:00]# echo 1 > rescan
> 
> The /sys/devices/pci0000:00/0000:00:16.3/ directory should disappear when
> you remove the device.  In this case you were *inside* the directory when
> you did the remove, so your shell is holding a reference to it.  But if you
> do this:
> 
>     # cd /sys/devices/pci0000:00
>     # echo 1 > 0000:00:16.3/remove
>     # ls
> 
> you should not see the 0000:00:16.3 directory any more.
> 
>> and the console contains
>>
>> [  353.212980] pci 0000:00:16.3: [8086:1e3d] type 00 class 0x070002
>> [  353.231163] pci 0000:00:16.3: BAR 1: assigned [mem 0xb1520000-0xb1520fff]
>> [  353.237937] pci 0000:00:16.3: BAR 0: assigned [io  0x1018-0x101f]
> 
> There should be some more output here.  I usually boot with
> "ignore_loglevel" to make sure I see everything.
> 
> The first line looks like it's from pci_setup_device().  The "BAR x:
> assigned" lines look like they're from pci_assign_resource().  But there
> should be "reg 0x%x: %pR\n" lines from __pci_read_base() in the middle.
> Those would show us what we actually got from the device BARs.

Hmm ... I'm booting with ignore_loglevel (that's my default FWIW) and didn't see
those printks.  I'm going to definitely debug that...

> 
> Can you check /proc/iomem and /proc/ioports before you remove it, after you
> remove it, and a third time after you do the rescan (I guess it's hung
> after the rescan, so you probably can't do that)?  I wonder if we forget to
> remove the original resources when we remove it, and then we reassign them
> after the rescan because we think they're still in use.

Yep, tried this already -- the mem and io regions are definitely released by the
serial driver.

> 
>> and the system is hung.  The addresses are clearly different here and I'll
>> debug further.  
> 
> Even if we change the 00:16.3 BARs, I wouldn't think the system would
> hang unless we actually try to *use* those new addresses.
> 
> And ... I guess the next thing we *should* see is the serial driver
> claiming this device again, and it would read some of the serial port
> registers.  I bet if you added instrumentation to pciserial_init_one(),
> you'd see output until you get into pciserial_init_ports().
> 

Yeah, was just about to do that :)  I'm going to see if we get into the serial
driver at all.

> The new addresses don't *look* like they should conflict with anything, but
> maybe they do.  I think we just need to figure out why we can't use the
> original addresses, and my first guess is that we don't release those
> resources correctly.
> 
>> The one odd thing is that the ttyS1 device is not removed from
>> /sys/ during the remove.  It's almost as if it is left around as a place holder
>> for 16.3 if it should be reinserted, and I have a feeling that may lead to the
>> system hang.
> 
> I'm not exactly sure what you're saying here.  Are you saying there's a
> /sys/.../ttyS1 that still exists after the remove?  Or do you mean that
> /sys/.../0000:00:16.3 still exists after the remove?  

/sys/...ttyS1 exists after the remove.  That has me scratching my head -- AFAICT
nothing is using ttyS1 on the system and I'm focusing in on the remove code as
you point out below.

P.

> 
> If the former, maybe there's something wrong with the remove path in the
> serial driver.  If the latter, that seems wrong unless there something like
> a shell holding a reference to the directory.
> 
> Bjorn
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux