Re: Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

Try ethtool or peek into /proc to check the network  interface stats. You
could use the HBA vendor's software installed on the OS to support the
hardware.

Example: If the HBA were from emulex then you could check the hbaanywhere
cli commands to run a healthcheck

- Ajay
On Mar 13, 2012 9:39 PM, "unix syzadmin" <unixsyzadmin@xxxxxxxxx> wrote:
>
> Thanks.
> I have downloaded and installed the OpenManage from Dell.
> The following commands say if the health of system components is OK.
> omreport chassis - health of all main components of the system chassis
> omreport chassis processors - cpu health
> omreport chassis memory - memory health
> omreport chassis pwrsupplies - power supply health
> omreport storage controller - raid controller health
>
> However this leaves out the integrated NIC ports and the HBA adapters.
> What linux / dell open manage commands can be used to confirm if those are
> healthy as well?
>
> Thanks,
>
>
> On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@xxxxxxxxxxxxxx> wrote:
>
> > On 3/12/12 5:28 PM, unix syzadmin wrote:
> >
> >> Hi,
> >>
> >> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
> >> We want to be able to catch any hardware issues when they occur to act
on
> >> them as quickly as possible.
> >>
> >> My understanding is that all hardware events/issues/errors are logged
in
> >> /var/log/mcelog (Machine Check Events log).  Is this correct?  Can't
> >> stress
> >> this enough; does it log all hardware issues
> >> (cpu,memory,disk,ethernet,**fibre/hba etc) ?
> >>
> >> Thanks,
> >>
> >
> > I've used MCElog to catch some CPU events but I think you might want to
> > check out Dell's OpenManage client.  It will report/monitor a lot more
> > information.
> >
> >
> > http://linux.dell.com/wiki/**index.php/Repository/OMSA<
http://linux.dell.com/wiki/index.php/Repository/OMSA>
> >
> >
> > To install:
> >
> > # wget -q -O -
http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<
http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash
> > # yum install srvadmin-base
> > # yum install srvadmin-storageservices
> >
> > (logout / login for environment variables to take effect)
> >
> > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh  start
> > ...
> >
> > # omreport chassis
> > Health
> >
> > Main System Chassis
> >
> > SEVERITY : COMPONENT
> > Ok       : Fans
> > Ok       : Intrusion
> > Ok       : Memory
> > Ok       : Power Supplies
> > Ok       : Processors
> > Ok       : Temperatures
> > Ok       : Voltages
> > Ok       : Hardware Log
> > Ok       : Batteries
> >
> > # omreport chassis temps
> > Temperature Probes Information
> >
> > ------------------------------**------
> > Main System Chassis Temperatures: Ok
> > ------------------------------**------
> >
> > Index                     : 0
> > Status                    : Ok
> > Probe Name                : System Board Ambient Temp
> > Reading                   : 20.0 C
> > Minimum Warning Threshold : 8.0 C
> > Maximum Warning Threshold : 42.0 C
> > Minimum Failure Threshold : 3.0 C
> > Maximum Failure Threshold : 47.0 C
> >
> > # omreport storage pdisk controller=0
> >
> > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)
> >
> > Controller SAS 6/iR Integrated (Embedded)
> > ID                        : 0:0:0
> > Status                    : Ok
> > Name                      : Physical Disk 0:0:0
> > State                     : Online
> > Failure Predicted         : No
> > Certified                 : Not Applicable
> > Encryption Capable        : No
> > Secured                   : Not Applicable
> > Progress                  : Not Applicable
> > Bus Protocol              : SAS
> > Media                     : HDD
> > Capacity                  : 67.75 GB (72746008576 bytes)
> > Used RAID Disk Space      : 67.75 GB (72746008576 bytes)
> > Available RAID Disk Space : 0.00 GB (0 bytes)
> > Hot Spare                 : No
> > Vendor ID                 : DELL
> > Product ID                : ST973402SS
> > Revision                  : S229
> >
> > <snip>
> >
> > You get the idea.
> >
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request@**redhat.com<
redhat-list-request@xxxxxxxxxx>
> > ?subject=unsubscribe
> > https://www.redhat.com/**mailman/listinfo/redhat-list<
https://www.redhat.com/mailman/listinfo/redhat-list>
> >
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
-- 
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list


[Index of Archives]     [CentOS]     [Kernel Development]     [PAM]     [Fedora Users]     [Red Hat Development]     [Big List of Linux Books]     [Linux Admin]     [Gimp]     [Asterisk PBX]     [Yosemite News]     [Red Hat Crash Utility]


  Powered by Linux