Re: [Bug 205701] New: Can't access RAM from PCIe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 6, 2019 at 7:57 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Fri, Dec 06, 2019 at 06:48:24PM +0200, Ranran wrote:
> > On Fri, Dec 6, 2019 at 5:08 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > On Fri, Dec 06, 2019 at 08:09:48AM +0200, Ranran wrote:
> > > > On Fri, Nov 29, 2019 at 8:38 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > > On Fri, Nov 29, 2019 at 06:10:51PM +0200, Ranran wrote:
> > > > > > On Fri, Nov 29, 2019 at 4:58 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > > > > On Fri, Nov 29, 2019 at 06:59:48AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote:
> > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205701
> > >
> > > > I have tried to upgrade to latest kernel 5.4 (elrepo in centos), but
> > > > with this processor/board (system x3650, Xeon), it get hang during
> > > > kernel boot, without any error in dmesg, just keeps waiting for
> > > > nothing for couple of minutes and than drops to dracut.
> > >
> > > - I don't think you ever said exactly what the original failure mode
> > >   was.  You said DMA from an FPGA failed.  What is the specific
> > >   device?  How do you know the DMA fails?
> >
> > FPGA is Intel's Arria 10 device.
>
> I really meant which bus/device/function it is so we can correlate it
> with the dmesg log and lspci output.
>
> > We know that DMA fails because on using signaltap/probing the DMA
> > transaction from FPGA to CPU's RAM we see that it stall, i.e. keep
> > waiting for the access to finish.
> > We don't observe any error in dmesg.
>
> I'm not familiar with Signal Tap, but Google suggests that it's
> basically an embedded logic analyzer on the FPGA itself.  So I assume
> that:
>
>   - On the working system (Intel DUO?) Signal Tap shows the PCIe
>     Memory Read TLP from the FPGA and the matching Completion.
>
>   - On the non-working system Signal Tap shows the PCIe Memory Read
>     TLP from the FPGA but the Completion never arrives.  I assume the
>     FPGA eventually logs a Completion Timeout error?
>
> My guess would be something's wrong with the address the FPGA is
> generating.  So please collect the complete dmesg log and /proc/iomem
> contents and the address used in the FPGA DMA TLP from both the
> working and non-working systems.  There should be some clue if we
> look at the differences between the systems.
>
> > >   You may also be able to just drop a v5.4 kernel on your v4.18
> > >   system, at least for testing purposes.
> > >
> > What does it mean to drop 5.4 kernel on 4.18 kernel ?
>
> Not on a v4.18 *kernel*; on the CentOS *file system* that was
> installed along with your v4.18-based kernel.  If you take a v5.4
> kernel built with the right config options/modules/etc, it should work
> on the same root filesystem as the v4.18 kernel.
>
> Bjorn

Hello,

I've installed ubuntu 19.10 with kernel 5.3, and I still see same
issue with Xeon.
I've attached result of lspci -vv

Thank you,
Ran

Attachment: lspci_vv
Description: Binary data


[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux