Thanks. I have downloaded and installed the OpenManage from Dell. The following commands say if the health of system components is OK. omreport chassis - health of all main components of the system chassis omreport chassis processors - cpu health omreport chassis memory - memory health omreport chassis pwrsupplies - power supply health omreport storage controller - raid controller health However this leaves out the integrated NIC ports and the HBA adapters. What linux / dell open manage commands can be used to confirm if those are healthy as well? Thanks, On Mon, Mar 12, 2012 at 9:00 PM, Paul Tader <ptader@xxxxxxxxxxxxxx> wrote: > On 3/12/12 5:28 PM, unix syzadmin wrote: > >> Hi, >> >> We run redhat linux on intel hardware (mostly Dell, lately dell R710s). >> We want to be able to catch any hardware issues when they occur to act on >> them as quickly as possible. >> >> My understanding is that all hardware events/issues/errors are logged in >> /var/log/mcelog (Machine Check Events log). Is this correct? Can't >> stress >> this enough; does it log all hardware issues >> (cpu,memory,disk,ethernet,**fibre/hba etc) ? >> >> Thanks, >> > > I've used MCElog to catch some CPU events but I think you might want to > check out Dell's OpenManage client. It will report/monitor a lot more > information. > > > http://linux.dell.com/wiki/**index.php/Repository/OMSA<http://linux.dell.com/wiki/index.php/Repository/OMSA> > > > To install: > > # wget -q -O - http://linux.dell.com/repo/**hardware/latest/bootstrap.cgi<http://linux.dell.com/repo/hardware/latest/bootstrap.cgi>| bash > # yum install srvadmin-base > # yum install srvadmin-storageservices > > (logout / login for environment variables to take effect) > > # /opt/dell/srvadmin/sbin/**srvadmin-services.sh start > ... > > # omreport chassis > Health > > Main System Chassis > > SEVERITY : COMPONENT > Ok : Fans > Ok : Intrusion > Ok : Memory > Ok : Power Supplies > Ok : Processors > Ok : Temperatures > Ok : Voltages > Ok : Hardware Log > Ok : Batteries > > # omreport chassis temps > Temperature Probes Information > > ------------------------------**------ > Main System Chassis Temperatures: Ok > ------------------------------**------ > > Index : 0 > Status : Ok > Probe Name : System Board Ambient Temp > Reading : 20.0 C > Minimum Warning Threshold : 8.0 C > Maximum Warning Threshold : 42.0 C > Minimum Failure Threshold : 3.0 C > Maximum Failure Threshold : 47.0 C > > # omreport storage pdisk controller=0 > > List of Physical Disks on Controller SAS 6/iR Integrated (Embedded) > > Controller SAS 6/iR Integrated (Embedded) > ID : 0:0:0 > Status : Ok > Name : Physical Disk 0:0:0 > State : Online > Failure Predicted : No > Certified : Not Applicable > Encryption Capable : No > Secured : Not Applicable > Progress : Not Applicable > Bus Protocol : SAS > Media : HDD > Capacity : 67.75 GB (72746008576 bytes) > Used RAID Disk Space : 67.75 GB (72746008576 bytes) > Available RAID Disk Space : 0.00 GB (0 bytes) > Hot Spare : No > Vendor ID : DELL > Product ID : ST973402SS > Revision : S229 > > <snip> > > You get the idea. > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@**redhat.com<redhat-list-request@xxxxxxxxxx> > ?subject=unsubscribe > https://www.redhat.com/**mailman/listinfo/redhat-list<https://www.redhat.com/mailman/listinfo/redhat-list> > -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list