Re: aer_inject vs. apei/einj

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[+cc linux-pci]

On Fri, Feb 19, 2016 at 4:09 AM, Jean Delvare <jdelvare@xxxxxxx> wrote:
> On Wed, 17 Feb 2016 11:03:31 -0600, Bjorn Helgaas wrote:
>> [+cc Huang, author of both aer_inject and apei/einj]
>>
>> On Wed, Feb 17, 2016 at 7:33 AM, Jean Delvare <jdelvare@xxxxxxx> wrote:
>> > Hi all,
>> >
>> > I am looking for some guidance regarding AER testing. I see that we
>> > have two different drivers for error injection in the kernel:
>> > aer_inject and apei/einj. The user-space aer-inject tool seems to only
>> > care about the former.
>> >
>> > How does one know which driver should be used on a given system? I
>> > suppose that only one of them will work on a given system?
>> >
>> > My impression is that aer_inject is for "native" AER handling while
>> > apei/einj is for ACPI-driven AER. Is it correct? If not I would
>> > appreciate some pointers explaining when aer_inject should be used and
>> > when apei/einj should be used.
>>
>> My understanding is that:
>>
>>   - aer_inject does not actually write to any hardware registers
>> itself (though I do see it writes to some masks).  It works by
>> replacing the PCI config accessors with new ones that make it look
>> like the AER registers have errors logged.
>>
>>   - apei/einj runs ACPI methods that apparently seed errors.  These
>> might use hardware support for seeding errors, which would of course
>> be platform-dependent.
>>
>> So aer_inject should work on any system at all.  I think apei/einj
>
> My problem is precisely that aer_inject doesn't work on any system I
> tested. Either the device doesn't support AER, or its root port doesn't
> support AER, or (further I've been) the "error device" of the root port
> doesn't exist.

OK, I should have said "aer_inject" should work on any system with
devices that support AER :)  And there are also some conditions
related to _OSC and "firmware-first" error handling, based on the HEST
table.  I expect that dmesg would show whether we can use AER and why
it might be disabled (and if dmesg doesn't show that, it *should*).

> I am not too familiar with PCIe but apparently PCIe
> devices can have "sub-devices" which do not show in "lspci" but show up
> in /sys/bus/pci_express/devices. I have yet to see an aer sub-device
> there on any of my systems.

Yes, the different PCIe services (AER, native hotplug, VC, etc.) are
handled sort of like subdevices.  This seems a bit hacky to me but
it's what we have.

Anyway, if you have a system where the root port and a device below it
support AER, but there's no subdevice for it, we must have disabled it
somehow.  Can you collect a dmesg and "lspci -vv" for it?  We should
be logging a clue there.

>> will only work if the platform supplies an EINJ table, and even when
>> it does, I suspect different platforms probably have different
>> injection capabilities.
>>
>> Huang probably can give a much better response.
>
> Huang, pleeeease? :)
>
> --
> Jean Delvare
> SUSE L3 Support
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux