Re: [Linaro-acpi] [RFC] ACPI on arm64 TODO List

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tuesday 13 January 2015 17:26:33 Al Stone wrote:
> On 01/13/2015 10:22 AM, Grant Likely wrote:
> > On Mon, Jan 12, 2015 at 7:40 PM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >> On Monday 12 January 2015 12:00:31 Grant Likely wrote:
> >>> RAS is also something where every company already has something that
> >>> they are using on their x86 machines. Those interfaces are being
> >>> ported over to the ARM platforms and will be equivalent to what they
> >>> already do for x86. So, for example, an ARM server from DELL will use
> >>> mostly the same RAS interfaces as an x86 server from DELL.
> >>
> >> Right, I'm still curious about what those are, in case we have to
> >> add DT bindings for them as well.
> > 
> > Certainly.
> 
> In ACPI terms, the features used are called APEI (Advanced Platform
> Error Interface), and defined in Section 18 of the specification.  The
> tables describe what the possible error sources are, where details about
> the error are stored, and what to do when the errors occur.  A lot of
> the "RAS tools" out there that report and/or analyze error data rely on
> this information being reported in the form given by the spec.
> 
> I only put "RAS tools" in quotes because it is indeed a very loosely
> defined term -- I've had everything from webmin to SNMP to ganglia,
> nagios and Tivoli described to me as a RAS tool.  In all of those cases,
> however, the basic idea was to capture errors as they occur, and try to
> manage them properly.  That is, replace disks that seem to be heading
> down hill, or look for faults in RAM, or dropped packets on LANs --
> anything that could help me avoid a catastrophic failure by doing some
> preventive maintenance up front.
> 
> And indeed a BMC is often used for handling errors in servers, or to
> report errors out to something like nagios or ganglia.  It could
> also just be a log in a bit of NVRAM, too, with a little daemon that
> reports back somewhere.  But, this is why APEI is used: it tries to
> provide a well defined interface between those reporting the error
> (firmware, hardware, OS, ...) and those that need to act on the error
> (the BMC, the OS, or even other bits of firmware).
> 
> Does that help satisfy the curiosity a bit?

Yes, it's much clearer now, thanks!

	Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux