Re: Diagnosing BAR allocation problems with complex bridge structure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 3 Jan 2017, at 18:07, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> 
> Hi Harry,
> 
> On Tue, Jan 03, 2017 at 03:35:13PM +0000, Harry Mallon wrote:
>> Hi,
>> 
>> In this email I am asking for advice on how to diagnose problems in assigning memory for PCI-E devices and bridges. I can reproduce my issue on the current mainline kernel but I am not currently planning to target any fixes for the mainline kernel (unless they prove to be useful outside of this machine). I am planning to target the CentOS 3.10.0 based kernel. I understand that I am not owed any help/patches etc from anyone here, especially not as I am using a non mainline kernel.
>> 
>> I am working on a machine with an odd PCI structure, it has 4 different PLX bridges and requires hotplug to work on at least 2 of these. We were previously using a kernel based on 3.3 and had to add a few (hacky, machine specific) patches to that to make it work correctly. We also use "pci=realloc,pcie_bus_safe" in the kernel cmdline. I am currently using that kernel as a reference to compare to my development version.
>> 
>> On my current setup some devices don't work, using "lspci -vvv" I can see that they are not all receiving the memory allocations that they need. They report like one of the two following (it changes depending on another PCI-E card being in or out):
>> 
>> 05:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN X] (rev a1) (prog-if 00 [VGA controller])
>>        Subsystem: NVIDIA Corporation Device 119a
>>        Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
>>        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>>        Interrupt: pin A routed to IRQ 11
>>        Region 0: Memory at <ignored> (32-bit, non-prefetchable) [disabled]
>>        Region 1: Memory at e0000000 (64-bit, prefetchable) [disabled] [size=256M]
>>        Region 3: Memory at f0000000 (64-bit, prefetchable) [disabled] [size=32M]
>>        Region 5: I/O ports at 4000 [disabled] [size=128]
>>        Expansion ROM at <ignored> [disabled]
>> 
>> 05:00.0 VGA compatible controller: NVIDIA Corporation GK110B [GeForce GTX TITAN Black] (rev ff) (prog-if ff)
>> 	!!! Unknown header type 7f
>> 
>> What tools and techniques can anyone recommend for diagnosing this type of problem? Is there a way to export all the bridge memory ranges in a way that can be visualised (maybe the newer kernel cannot allocate enough aligned space)? Is there a way to enable extra PCI debug in the kernel? Is there a way to make the kernel panic and 
>> report when it fails to assign memory on boot instead of continuing with non-functional hardware?
> 
> A complete dmesg log from a current kernel and complete "lspci -vv"
> output is the best place to start.  The goal is that all your devices
> should work without requiring any machine-specific patches or
> command-line parameters.  Our resource allocation code is not really
> very robust, so unusual topologies don't always work out of the box.
> 
> You can open a bug report at https://bugzilla.kernel.org in the
> drivers/PCI area, attach the dmesg and lspci output, and respond with
> the URL here.

https://bugzilla.kernel.org/show_bug.cgi?id=191921

I have attached messages log (zipped, wouldn't let me do the whole thing) and lspci.

> 
> Bjorn

--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux