Re: [Bug 205701] New: Can't access RAM from PCIe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 06, 2019 at 06:48:24PM +0200, Ranran wrote:
> On Fri, Dec 6, 2019 at 5:08 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Fri, Dec 06, 2019 at 08:09:48AM +0200, Ranran wrote:
> > > On Fri, Nov 29, 2019 at 8:38 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > On Fri, Nov 29, 2019 at 06:10:51PM +0200, Ranran wrote:
> > > > > On Fri, Nov 29, 2019 at 4:58 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > > > On Fri, Nov 29, 2019 at 06:59:48AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205701
> >
> > > I have tried to upgrade to latest kernel 5.4 (elrepo in centos), but
> > > with this processor/board (system x3650, Xeon), it get hang during
> > > kernel boot, without any error in dmesg, just keeps waiting for
> > > nothing for couple of minutes and than drops to dracut.
> >
> > - I don't think you ever said exactly what the original failure mode
> >   was.  You said DMA from an FPGA failed.  What is the specific
> >   device?  How do you know the DMA fails?
> 
> FPGA is Intel's Arria 10 device.

I really meant which bus/device/function it is so we can correlate it
with the dmesg log and lspci output.

> We know that DMA fails because on using signaltap/probing the DMA
> transaction from FPGA to CPU's RAM we see that it stall, i.e. keep
> waiting for the access to finish.
> We don't observe any error in dmesg.

I'm not familiar with Signal Tap, but Google suggests that it's
basically an embedded logic analyzer on the FPGA itself.  So I assume
that:

  - On the working system (Intel DUO?) Signal Tap shows the PCIe
    Memory Read TLP from the FPGA and the matching Completion.

  - On the non-working system Signal Tap shows the PCIe Memory Read
    TLP from the FPGA but the Completion never arrives.  I assume the
    FPGA eventually logs a Completion Timeout error?

My guess would be something's wrong with the address the FPGA is
generating.  So please collect the complete dmesg log and /proc/iomem
contents and the address used in the FPGA DMA TLP from both the
working and non-working systems.  There should be some clue if we
look at the differences between the systems.

> >   You may also be able to just drop a v5.4 kernel on your v4.18
> >   system, at least for testing purposes.
> >
> What does it mean to drop 5.4 kernel on 4.18 kernel ?

Not on a v4.18 *kernel*; on the CentOS *file system* that was
installed along with your v4.18-based kernel.  If you take a v5.4
kernel built with the right config options/modules/etc, it should work
on the same root filesystem as the v4.18 kernel.

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux