Re: Since Linux 4.1: A lot of AMD-Vi IO_PAGE_FAULTs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 07/24/2015 at 06:15 PM, Bjorn Helgaas wrote:
> [+cc Tejun, linux-ide]
> 
> On Thu, Jul 23, 2015 at 11:22 PM, Andreas Hartmann
> <andihartmann@xxxxxxxxxx> wrote:
>> On Tue, Jul 21, 2015 at 06:35PM +0200, Joerg Roedel wrote:
>>> On Tue, Jul 21, 2015 at 06:20:23PM +0200, Andreas Hartmann wrote:
>>>> [   48.193901] <6>[fglrx] Firegl kernel thread PID: 1840
>>>> [   48.193985] <6>[fglrx] Firegl kernel thread PID: 1841
>>>> [   48.194063] <6>[fglrx] Firegl kernel thread PID: 1842
>>>> [   48.194172] <6>[fglrx] IRQ 28 Enabled
>>>> [   48.261580] <6>[fglrx] Reserved FB block: Shared offset:0, size:1000000
>>>> [   48.261586] <6>[fglrx] Reserved FB block: Unshared offset:f7b4000, size:4000
>>>> [   48.261587] <6>[fglrx] Reserved FB block: Unshared offset:f7b8000, size:548000
>>>> [   48.261588] <6>[fglrx] Reserved FB block: Unshared offset:3fff3000, size:d000
>>>
>>> From a first glance it doesn't look like an IOMMU driver issue, because
>>> the addresses where the faults happen are not from the AMD IOMMU driver.
>>>
>>> And you have proprietary closed-source drivers loaded, can you reproduce
>>> the issue without fglrx?
>>
>> Yes. I attached this one.
>>
>> Meanwhile I tested with 4.0.9, too. I wasn't able to reproduce the
>> problem with this kernel even after lots of reboots (the problem w/ 4.1
>> usually comes up during boot process (but not only - it can be seen
>> after boot process, too)).
>>
>> The problem always is, that there are errors w/ one of the sata discs
>> and at the same time, IO_PAGE_FAULT errors are rising as described before:
>>
>> [  152.533708] ata3.00: failed command: READ FPDMA QUEUED
>> [  152.538102] ata3.00: failed command: READ FPDMA QUEUED
>> [  152.539862] ata3.00: failed command: READ FPDMA QUEUED
>> [  152.541778] ata3.00: failed command: WRITE FPDMA QUEUED
>> [  152.543861] ata3.00: failed command: WRITE FPDMA QUEUED
>>
>> [ 5818.068050] ata2.00: failed command: WRITE FPDMA QUEUED
>> [ 5818.068059] ata2.00: failed command: WRITE FPDMA QUEUED
>>
>> I compared dmesg from 4.1 w/ 4.0 and I realized the following *missing*
>> entries in 4.1:
>>
>> [    0.000000] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x01] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x03] lapic_id[0x03] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x04] lapic_id[0x04] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x05] lapic_id[0x05] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x06] lapic_id[0x06] enabled)
>> [    0.000000] ACPI: LAPIC (acpi_id[0x07] lapic_id[0x07] enabled)
>>
>>
>> What does this mean? Is there missing some part of the acpi initialization?
>>
>>
>> Thanks for any hint as Linux 4.1 is completely unusable here with these
>> errors.
> 
> This looks more like an AHCI problem than an IOMMU or PCI problem.
> Seems like the device has the wrong idea about where its DMA buffers
> are.  Maybe something scribbled on its command list?

During further tests I detected, that the problem already occurs in
Linux 4.0. I couldn't see it in 3.19.8 until now.


I tried hard to bisect it. I got stuck 2 times of 3 here (the third
round, I got stuck later on - unfortunately, sometimes it is working :-( ):

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=be5e6616dd74e17fdd8e16ca015cfef94d49b467

Does this help?


> From your attachments:
> 
> # lspci -vvs 00:11.0
> 00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI]
> SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI
> 1.0])
> 
> pci 0000:00:11.0: [1002:4391] type 00 class 0x010601
> ahci 0000:00:11.0: version 3.0
> ahci 0000:00:11.0: AHCI 0001.0200 32 slots 6 ports 6 Gbps 0x3f impl SATA mode
> ahci 0000:00:11.0: flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
> address=0x40eba32100618000 flags=0x0010]
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
> address=0x40eba32100618040 flags=0x0010]
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
> address=0x0000000000000000 flags=0x0000]
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
> address=0x00000000000000c0 flags=0x0000]
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
> address=0x0000000000000040 flags=0x0000]
> AMD-Vi: Event logged [IO_PAGE_FAULT device=00:11.0 domain=0x0008
> address=0x00000000000001c0 flags=0x0000]


Regards,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux