Re: [PATCH v3 00/15] PCI/iommu: Fix DMA alias problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc original lists]

Hi Edward,

On Tue, 2014-05-13 at 15:35 -0700, eddy0596 wrote:
> Hello Alex,
> 
> Thanks for working on a fix on this long standing issue. I have applied the
> amd portion of the IOMMU patches against the 3.14.3 kernel and found the
> followings:
> 1) The computer would not boot up if it's from a cold start. The kernel log
> shows that it hangs at the point the kernel attempt to attach the scsi disk
> [sdk] that connects to the LSI-SAS2008 controller at pci 04:00.0. I can use
> Ctrl+Alt+Del to reboot the computer. So, I guess the kernel didn't "hang"
> and I don't see any oops either.
> 2) After a warm reboot with Ctrl+Alt+Del, the kernel will boot up fine. And,
> the Marvell controller behaves properly (More stress test needed) and so as
> the two LSI-SAS2008. A warm reboot after a hard reset at BIOS prompt will
> also boot up fine.

Both of these indicate that the hand-off state of the system is
different between a warm an cold reset.  Can you capture the boot
messages (serial console or netconsole) of each case and add the
pci=earlydump option so we can compare the PCI state?

> 3) Removing sdk and perform a cold reboot, the kernel stops after attaching
> all the ST3000DM001 harddisks that connects to the LSI-SAS2008 at pci
> 01:00:0. The kernel stops at "ata12: SATA link down (SStatus 0 SControl
> 300)".
> 4) Removing sda and sdl that connects to the Marvell 88SE9172 at pci
> 09:00.0, the kernel stops after attaching the eight ST3000DM001 that
> connects to the LSI-SAS2008 at pci 01:00:0.

So it's not an issue with those specific disks.  Is it possible to
remove or disable the controller in the BIOS to further isolate?

> 5) Cold start with a kernel without the IOMMU patches starts up fine except
> a number of kernel oops related to the Marvell controller complaining about
> invalid PCI access from the AMD IOMMU.

Is this kernel built from the same source tree as below without the
indicated IOMMU patches applied?

> Attached is the kernel boot log that's obtained with all HDDs attached and
> successfully boot up after a warm reboot and some information on my setup.
> Let me know if you need more information/log to help with debuging. 

The mailing list doesn't like attachments, but it was included in the
re-send to me where it was inline.  An unsuccessful boot log is probably
the most interesting, preferably with the pci=earlydump option (and
continue to use the amd_iommu_dump option as well).  Also, what happens
with amd_iommu=off?  If we're not getting any IOMMU faults, it seems
like the patches are doing their job and I'm at a bit of a loss to
understand how it would fail only on a cold boot.

It might also be useful to test the branch provided in case there's an
issue with backporting the patches to 3.14.

> Best Regards,
> 
> Edward Cheung
> 
> Motherboard: Gigabyte GA-990FXA-UD5 Revision 1.0. Note that the kernel is
> using software IO TLB belief due to broken IVRS table. I am still trying to
> find a fix for this.

What brings you to the conclusion that the IVRS table is broken?  IIRC,
AMD-Vi initializes the swiotlb to support pasthrough devices that can
only do 32bit DMA... or something like that.  So I don't think it's
unusual to see it initialized alongside AMD IOMMU.  Thanks,

Alex


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux