Is there an equivalent of 'ceph health' but for OSD ? Like warning about slowness or troubles with communication between OSDs? I've spent good amount of time debugging what looked like stuck pgs only but it turned out to be bad NIC and it was only apparent once I saw some OSD logs like 2016-02-08 03:42:27.810289 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no reply from osd.14 ever on either front or back, first ping sent 2016-02-08 03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288) 2016-02-08 03:42:27.810297 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no reply from osd.15 ever on either front or back, first ping sent 2016-02-08 03:39:24.860852 (cutoff 2016-02-08 03:39:27.810288) 2016-02-08 03:42:28.311125 7fc9b8bff700 -1 osd.9 146800 heartbeat_check: no reply from osd.14 ever on either front or back, first ping sent 2016-02-08 03:39:24.860852 (cutoff 2016-02-08 03:39:28.311124) (turned out to be bad nic, fuck emulex) is there anything that could dump things like "failed heartbeats in last 10 minutes" or similiar stats ? -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 F: [+48] 22 380 13 14 E: mariusz.gronczewski@xxxxxxxxxxxx <mailto:mariusz.gronczewski@xxxxxxxxxxxx>
Attachment:
pgpZRlJISP6ol.pgp
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com