How to monitor health and connectivity of OSD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Is there an equivalent of 'ceph health' but for OSD ?

Like warning about slowness or troubles with communication between OSDs?

I've spent good amount of time debugging what looked like stuck pgs
only but it turned out to be bad NIC and it was only apparent once I
saw some OSD logs like

2016-02-08 03:42:27.810289 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no reply from osd.14 ever on either front or back, first ping sent 2016-02-08 03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
2016-02-08 03:42:27.810297 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no reply from osd.15 ever on either front or back, first ping sent 2016-02-08 03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288)
2016-02-08 03:42:28.311125 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no reply from osd.14 ever on either front or back, first ping sent 2016-02-08 03:39:24.860852 (cutoff 2016-02-08 03:39:28.311124)

(turned out to be bad nic, fuck emulex)

is there anything that could dump things like "failed heartbeats in
last 10 minutes"  or similiar stats ?

-- 
Mariusz Gronczewski, Administrator

Efigence S. A.
ul. Wołoska 9a, 02-583 Warszawa
T: [+48] 22 380 13 13
F: [+48] 22 380 13 14
E: mariusz.gronczewski@xxxxxxxxxxxx
<mailto:mariusz.gronczewski@xxxxxxxxxxxx>

Attachment: pgpZRlJISP6ol.pgp
Description: OpenPGP digital signature

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux