Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware bug question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc Jeff, linux-ide, David, Joerg, iommu]

On Thu, Nov 29, 2012 at 7:39 PM, Robert Hancock <hancockrwd@xxxxxxxxx> wrote:
> On Thu, Nov 29, 2012 at 12:16 PM, Bjorn Helgaas <bhelgaas@xxxxxxxxxx> wrote:
>> On Thu, Nov 29, 2012 at 1:55 AM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx> wrote:
>>>
>>>
>>> -----Original Message-----
>>> From: Robert Hancock [mailto:hancockrwd@xxxxxxxxx]
>>> Sent: Wednesday, November 28, 2012 7:55 PM
>>> To: Justin Piszcz
>>> Cc: Bjorn Helgaas; Bruno Prémont; support@xxxxxxxxxxxxxx;
>>> linux-kernel@xxxxxxxxxxxxxxx; Dan Williams
>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error & ACPI/firmware
>>> bug question
>>>
>>> On Wed, Nov 28, 2012 at 6:49 PM, Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx>
>>> wrote:
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Robert Hancock [mailto:hancockrwd@xxxxxxxxx]
>>>> Sent: Wednesday, November 28, 2012 7:35 PM
>>>> To: Justin Piszcz
>>>> Cc: 'Bjorn Helgaas'; 'Bruno Prémont'; support@xxxxxxxxxxxxxx;
>>>> linux-kernel@xxxxxxxxxxxxxxx; 'Dan Williams'
>>>> Subject: Re: Supermicro X9SRL-F - channel enumeration error &
>>> ACPI/firmware
>>>> bug question
>>>>
>>>>
>>>> What does lspci -vv show on that controller? Not sure what actual
>>>> chipset that controller is, but there's a known issue with some Marvell
>>>> 6Gbps SATA controllers with DMAR enabled - it seems the device issues
>>>> memory read/write requests from the wrong PCI function ID and the IOMMU
>>>> rightly denies access as the function listed in the requests doesn't
>>>> have any mapping to that memory. I don't think there's presently a
>>>> workaround other than disabling DMAR. We could (and likely should) be
>>>> detecting that device and adding some kind of quirk for it.
>>>>
>>>> That sounds likely...
>>>> It is shown below:
>>>>
>>>> Card name: HighPoint Rocket 620 Dual Port SATA 6 Gbps PCI Express 2.0 Host
>>>> Adapter
>>>>
>>>> lspci -vv output:
>>>>
>>>> 84:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA
>>>> 6.0 Gb/s controller (rev 11) (prog-if 01 [AHCI 1.0])
>>>>   Subsystem: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s
>>>> controller
>>>
>>> Yeah, that's one of those controllers I think. But I can't tell from
>>> the bit of the dmesg you posted exactly what's going on. Can you post
>>> a full boot log from having the card installed and some drive attached
>>> (by putting the boot drive on another controller for example)?
>>>
>>>>> ==> Further issues with the X9SRL-F -- does this board support ASPM or is
>>>>> this a Linux/ASPM implementation issue?
>>>>> [    0.632170]  pci0000:ff: ACPI _OSC support notification failed,
>>>> disabling
>>>>> PCIe ASPM
>>>>> [    0.632239]  pci0000:ff: Unable to request _OSC control (_OSC support
>>>>> mask: 0x08)
>>>>
>>>> What's the full dmesg from this machine (or is it already posted
>>> somewhere)?
>>>>
>>>> It is now available here:
>>>> http://home.comcast.net/~jpiszcz/20121128/dmesg.txt
>>>
>>>> Is that the same boot log? It doesn't have this error in it.
>>>
>>> Yes, the error is here: (its towards the bottom)
>>>
>>>  [    7.973015] ata14.00: qc timeout (cmd 0xa1)
>>> [    8.472120] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [    9.275922] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
>>> [   19.260667] ata14.00: qc timeout (cmd 0xa1)
>>> [   19.759828] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   19.760451] ata14: limiting SATA link speed to 1.5 Gbps
>>> [   20.566598] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   50.521078] ata14.00: qc timeout (cmd 0xa1)
>>> [   51.020880] ata14.00: failed to IDENTIFY (I/O error, err_mask=0x4)
>>> [   51.824664] ata14: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
>>> [   51.824682] dmar: DRHD: handling fault status reg 502
>>> [   51.824686] dmar: DMAR:[DMA Read] Request device [04:00.0] fault addr 0
>>> [   51.824686] DMAR:[fault reason 06] PTE Read access is not set
>>
>> You have these devices:
>>
>>     pci 0000:04:00.0: [10de:01d3] type 00 class 0x030000 nVidia G72
>>     pci 0000:84:00.0: [1b4b:9123] type 00 class 0x010601 Marvell 88SE9123 SATA
>>     pci 0000:84:00.1: [1b4b:91a4] type 00 class 0x01018f Marvell 88SE9128 IDE
>>
>> I think the 04:00.0 DMAR errors are symptoms of nouveau driver issues,
>> and if you get rid of that driver, they'll probably go away.
>>
>> But this 84:00.1 DMAR error:
>>
>>     dmar: DMAR:[DMA Read] Request device [84:00.1] fault addr fff00000
>>     DMAR:[fault reason 02] Present bit in context entry is clear
>>
>> looks like the probable cause of the Marvell issue.  It looks similar
>> to https://bugzilla.kernel.org/show_bug.cgi?id=42679, although the
>> reports there show a bb:dd.0 device (but no bb:dd.1 device), and the
>> DMAR rejects DMA that appears to be from bb:dd.1.
>>
>> Another report that's even more similar is
>> https://bugzilla.redhat.com/show_bug.cgi?id=757166 .  In that case,
>> both bb:dd.0 and bb:dd.1 exist (as in your system), and the DMAR fault
>> is exactly like what you're seeing.
>>
>> So you're not alone, but unfortunately, nobody seems to be working on
>> either bug report.  I took the liberty to add you to the cc: list of
>> both.
>>
>> I don't really know what else to do at this point.  Maybe a SATA
>> expert with some Marvell docs could figure out why we're seeing DMA
>> from the IDE controller, but I'm not that person :)
>
> I doubt any Marvell docs would really be very helpful (except for
> maybe an errata list but that likely would just tell us what we can
> already figure out). The SATA controller part of the device seems to
> just be issuing accesses with the wrong PCI function ID.
>
> The only solution I can think of would be at the PCI/DMAR layer -
> basically functions 0 and 1 on this device should be allowed to access
> each other's DMA regions.

That's essentially the patch at
https://bugzilla.redhat.com/show_bug.cgi?id=757166#c16, which in my
opinion is too ugly to consider.  But fortunately, I'm not the
maintainer for any IOMMU drivers.

My point about the docs is that often we think "this hardware is
clearly broken and the only workaround is X," but sometimes it's just
that we don't understand the hardware designer's intent.  It may be
that the hardware was just never tested with DMAR and is indeed
broken, or it may be that it does work with DMAR given a different
driver structure or different device initialization.  I just don't
want lack of imagination to force us to assume there's only one
workaround.
--
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux