On Fri, Dec 6, 2019 at 7:57 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Fri, Dec 06, 2019 at 06:48:24PM +0200, Ranran wrote: > > On Fri, Dec 6, 2019 at 5:08 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > On Fri, Dec 06, 2019 at 08:09:48AM +0200, Ranran wrote: > > > > On Fri, Nov 29, 2019 at 8:38 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > On Fri, Nov 29, 2019 at 06:10:51PM +0200, Ranran wrote: > > > > > > On Fri, Nov 29, 2019 at 4:58 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > > > On Fri, Nov 29, 2019 at 06:59:48AM +0000, bugzilla-daemon@xxxxxxxxxxxxxxxxxxx wrote: > > > > > > > > https://bugzilla.kernel.org/show_bug.cgi?id=205701 > > > > > > > I have tried to upgrade to latest kernel 5.4 (elrepo in centos), but > > > > with this processor/board (system x3650, Xeon), it get hang during > > > > kernel boot, without any error in dmesg, just keeps waiting for > > > > nothing for couple of minutes and than drops to dracut. > > > > > > - I don't think you ever said exactly what the original failure mode > > > was. You said DMA from an FPGA failed. What is the specific > > > device? How do you know the DMA fails? > > > > FPGA is Intel's Arria 10 device. > > I really meant which bus/device/function it is so we can correlate it > with the dmesg log and lspci output. > > > We know that DMA fails because on using signaltap/probing the DMA > > transaction from FPGA to CPU's RAM we see that it stall, i.e. keep > > waiting for the access to finish. > > We don't observe any error in dmesg. > > I'm not familiar with Signal Tap, but Google suggests that it's > basically an embedded logic analyzer on the FPGA itself. So I assume > that: > > - On the working system (Intel DUO?) Signal Tap shows the PCIe > Memory Read TLP from the FPGA and the matching Completion. > > - On the non-working system Signal Tap shows the PCIe Memory Read > TLP from the FPGA but the Completion never arrives. I assume the > FPGA eventually logs a Completion Timeout error? > > My guess would be something's wrong with the address the FPGA is > generating. So please collect the complete dmesg log and /proc/iomem > contents and the address used in the FPGA DMA TLP from both the > working and non-working systems. There should be some clue if we > look at the differences between the systems. > > > > You may also be able to just drop a v5.4 kernel on your v4.18 > > > system, at least for testing purposes. > > > > > What does it mean to drop 5.4 kernel on 4.18 kernel ? > > Not on a v4.18 *kernel*; on the CentOS *file system* that was > installed along with your v4.18-based kernel. If you take a v5.4 > kernel built with the right config options/modules/etc, it should work > on the same root filesystem as the v4.18 kernel. > > Bjorn Hello, I've installed ubuntu 19.10 with kernel 5.3, and I still see same issue with Xeon. I've attached result of lspci -vv Thank you, Ran
Attachment:
lspci_vv
Description: Binary data