Re: Problem with PCIe enumeration of Google/Coral TPU Edge module on Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

A quick look at the logs, makes me wonder if BAR0-BAR5 are only being
assigned to IO space on device by Linux, and BAR6 is the first bar
index that Linux is assigning on armhf/arm64 for mem space. If so,
that would be wrong because registers 0x10 and and 0x18 are BAR0 and
BAR2.
The TP-Link Gigabit LAN card that is installed on the other PCIe slot
has BAR 0 enabled but it is IO space and according to the registers
for the mem space, in that device, seen in dmesg, they are regs 0x18
and 0x20, or, BAR 2 and BAR 4, but Linux is assigning them to have
indices BAR 6 and BAR 8, since they are MEM space devices.
That looks wrong.... maybe the TP-Link device is working just because
BAR 0 does happen to exist in this case, which happens to be the
minimal requirement that allows pci_enable_device(...) to work,
passing in the test 'if (!r->parent)' performed by
pci_enable_resources(...) at drivers/pci/setup-res.c. Since most PCIe
cards have IO space, this generally works.
Can it be?

Luís

On Sat, Mar 7, 2020 at 12:11 PM Luís Mendes <luis.p.mendes@xxxxxxxxx> wrote:
>
> Hi Bjorn,
>
> Thanks for your help.
>
> On Fri, Mar 6, 2020 at 9:47 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > [+cc Thomas, Jason, Nicholas, Ben]
> >
> > On Fri, Mar 06, 2020 at 02:32:59PM +0000, Luís Mendes wrote:
> > > Hi,
> > >
> > > I'm trying to use Google/Coral TPU Edge modules for a project, on
> > > arm64 and armhf, but BAR0 doesn't get assigned during the enumeration
> > > of PCIe devices and consequently pci_enable_device(...) fails on BAR0
> > > resource with value -22 (EINVAL) (resource has null parent) when
> > > loading gasket/apex driver.
> > >
> > > I'm also trying to adapt gasket/apex to run on armhf, but anyhow that
> > > is not the root cause for this issue.
> > >
> > > Relevant Log extracts follow in attachment.
> >
> > Hi Luís,
> >
> > Thanks for the report, and sorry for the problem you're tripping over.
> > I cc'd a few folks who might be interested.
> >
> > > [    6.983880] mvebu-pcie soc:pcie: /soc/pcie/pcie@1,0: reset gpio is active low
> > > [    6.993528] hub 4-1:1.0: 4 ports detected
> > > [    6.993749] mvebu-pcie soc:pcie: /soc/pcie/pcie@2,0: reset gpio is active low
> > > [    7.106741]  sdb: sdb1
> > > [    7.109826] sd 2:0:0:0: [sdb] Attached SCSI removable disk
> > > [    7.242916] mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
> > > [    7.248854] pci_bus 0000:00: root bus resource [bus 00-ff]
> > > [    7.254370] pci_bus 0000:00: root bus resource [mem 0xd0000000-0xefffffff]
> > > [    7.261267] pci_bus 0000:00: root bus resource [io  0x1000-0xeffff]
> > > [    7.267621] pci 0000:00:01.0: [11ab:6828] type 01 class 0x060400
> > > [    7.273662] pci 0000:00:01.0: reg 0x38: [mem 0x00000000-0x000007ff pref]
> > > [    7.293971] PCI: bus0: Fast back to back transfers disabled
> > > [    7.299558] pci 0000:00:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
> > > [    7.315694] pci 0000:01:00.0: [1ac1:089a] type 00 class 0x0000ff
> > > [    7.321749] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00003fff 64bit pref]
> > > [    7.322814] usb 4-1.1: new high-speed USB device number 3 using xhci-hcd
> > > [    7.329004] pci 0000:01:00.0: reg 0x18: [mem 0x00000000-0x000fffff 64bit pref]
> > > [    7.343111] pci 0000:01:00.0: 2.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x1 link at 0000:00:01.0 (capable of 4.000 Gb/s with 5 GT/s x1 link)
> > > [    7.383442] PCI: bus1: Fast back to back transfers disabled
> > > [    7.389031] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
> > > [    7.495604] pci 0000:00:02.0: ASPM: current common clock configuration is broken, reconfiguring
> > > [    7.552513] pci 0000:00:01.0: BAR 8: assigned [mem 0xe8000000-0xe81fffff]
> > > [    7.565611] pci 0000:00:01.0: BAR 6: assigned [mem 0xe8200000-0xe82007ff pref]
> > > [    7.580096] pci 0000:00:01.0: PCI bridge to [bus 01]
> > > [    7.585079] pci 0000:00:01.0:   bridge window [mem 0xe8000000-0xe81fffff]
> > > [    7.653228] pcieport 0000:00:01.0: enabling device (0140 -> 0142)
> > >
> > >
> > > [   11.188025] gasket: module is from the staging directory, the quality is unknown, you have been warned.
> > > [   11.217048] apex: module is from the staging directory, the quality is unknown, you have been warned.
> > > [   11.217926] apex 0000:01:00.0: can't enable device: BAR 0 [mem 0x00000000-0x00003fff 64bit pref] not claimed
> > > [   11.227825] apex 0000:01:00.0: error enabling PCI device
> >
> > It looks like we did assign space for the bridge window leading to
> > 01:00.0, but failed to assign space to the 01:00.0 BAR itself.
> >
> > I don't know offhand why that would be.  Can you put the entire dmesg
> > log somewhere we can see?  That will tell us what kernel you're using
> > and possibly other useful things.
>
> Sure, complete dmesg is available at: https://pastebin.ubuntu.com/p/qnzJ56kM7k/
> This is a custom built machine based on the Armada 388 armhf SoC, that
> I am using, but I've also tried an arm64 machine from Toradex, an
> Apalis IMX8QM with the Apalis Eval board, producing similar results
> with different kernels and also different ARM architectures.
> >
> > Does the same problem happen with other devices, or does it only
> > happen with the gasket/apex combination?  There shouldn't be anything
> > device-specific in the PCI core resource assignment code.
>
> This issue seems to happen only with the Coral Edge TPU device, but it
> happens independently of whether the gasket/apex driver module is
> loaded or not. The BAR 0 of the Coral device is not assigned either
> way.
>
> Luís




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux