Re: Problem with PCIe enumeration of Google/Coral TPU Edge module on Linux

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nicholas, Bjorn,

I was able to make the apex driver work on a X86_64 system with the
Coral Edge TPU PCIe device.
So, now the PCI enumeration problem is now clearly an ARM and ARM64
platform issue. What are the recommended steps for debugging this? I
hava a JTAG interface and openOCD supported configuration for it.

On X86_64 the PCI device did enumerate properly, but the driver would
fail to load due to a bug. That is now fixed and it did run a couple
of examples, on X86_64 only, after applying  a patch that I submitted
to the Gasket driver maintainers:
Fix incongruency in handling of sysfs entries creation.
This issue could cause invalid memory accesses, by not properly
detecting the end of the sysfs attributes array.

Signed-off-by: Luis Mendes <luis.p.mendes@xxxxxxxxx>
---

 gasket_sysfs.c |    3 +--
 gasket_sysfs.h |    4 ----
 2 files changed, 1 insertion(+), 6 deletions(-)

Kind Regards,
Luís

> On Mon, Mar 9, 2020 at 11:21 AM Luís Mendes <luis.p.mendes@xxxxxxxxx> wrote:
> >
> > Hi Nicholas,
> >
> > Thanks for your help.
> > Replies follow below.
> >
> > Kind Regards,
> > Luís
> >
> > On Sun, Mar 8, 2020 at 5:51 AM Nicholas Johnson
> > <nicholas.johnson-opensource@xxxxxxxxxxxxxx> wrote:
> > >
> > > Hi,
> > > > > On Sat, Mar 7, 2020 at 12:11 PM Luís Mendes <luis.p.mendes@xxxxxxxxx> wrote:
> > > > > > This issue seems to happen only with the Coral Edge TPU device, but it
> > > > > > happens independently of whether the gasket/apex driver module is
> > > > > > loaded or not. The BAR 0 of the Coral device is not assigned either
> > > > > > way.
> > > > > >
> > > > > > Luís
> > >
> > > So the problem only occurs with the Coral Edge TPU device, so there is a
> > > possibility that it is not a problem with the platform, or something
> > > caused by the combination of the TPU and platform. Is it possible to put
> > > the TPU into an X86 system with the same kernel version(s) to add more
> > > evidence to this theory? If it works on X86 then we can focus on the
> > > differences between X86 and ARM.
> >
> > I've tested two Coral TPUs on two x86_64 platforms and the BARs seem
> > to be assigned, however the driver fails to load, during probe and the
> > system is unable to shutdown cleanly, but I think that is a driver
> > issue when setting up sysfs entries. I can blacklist gasket/apex, if
> > it helps in some way, or try an older kernel.
> > Dmesg log for one of the x86_64 system is here:
> > https://pastebin.ubuntu.com/p/FHhHNN6XTF/
> > lspci -vvv for the same x86_64 system is here:
> > https://pastebin.ubuntu.com/p/xbSNWFQ9TS/
> >
> > >
> > > Also, please revert c13704f5685d "PCI: Avoid double hpmemsize MMIO
> > > window assignment" or try with Linux v5.4 which does not have this
> > > commit, just to rule out the possibility of it causing issues. This
> > > patch helps me and also solved the problem of one other person using an
> > > ARM computer who came to us regarding a problem. However, it could also
> > > adversely affect unknown use cases - it is impossible to completely rule
> > > out, due to the nature of how drivers/pci/setup-bus.c is written.
> >
> > On armhf with 5.4.14 the problem remains, BAR 0 and BAR 2 are not
> > assigned: https://pastebin.ubuntu.com/p/9H2qqqMNJN/
> > I've also tried kernel 4.20.11 and the problem also exists.
> > I've got JTAG on this system with OpenOCD. I believe I can debug the
> > kernel, if needed.
> >
> > >
> > > Kind regards,
> > > Nicholas




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux