ceph Nautilus: device health management, no infos in: ceph device ls

Rainer Krienke <krienke@xxxxxxxxxxxxxx> · Fri, 29 Apr 2022 10:43:54 +0200

Hello,

I run a ceph Nautilus 14.2.22 cluster with 144 OSDs. In order to be able 
to see if a disk has hardware trouble and might fail soon I activated 
health management. The cluster is running on Ubuntu 18.04 and the first 
task was to install a newer smartctl version. I used smartctl 7.0.

Device monitoring ist activated (ceph device monitoring on). Using ceph 
device get-health-metrics <device ID> I see the results of smartctl runs 
for the device with the given ID like this:

....
     "product": "ST4000NM0295",
        "revision": "DT31",
        "rotation_rate": 7200,
        "scsi_error_counter_log": {
            "read": {
                "correction_algorithm_invocations": 20,
                "errors_corrected_by_eccdelayed": 20,
                "errors_corrected_by_eccfast": 3457558131,
....

So this seems to run just fine. For failure prediction I selected the 
"local" method (ceph config set global device_failure_prediction_mode 
local).

Whats missing for me is the prediction output in ceph device ls. The 
column  "LIFE EXPECTANCY" is always empty and I have no idea why:

# ceph device ls
DEVICE                            HOST:DEV  DAEMONS LIFE EXPECTANCY
SEAGATE_ST4000NM017A_WS23WKJ4     ceph4:sdb osd.49
SEAGATE_ST4000NM0295_ZC13XK9P     ceph6:sdo osd.92
SEAGATE_ST4000NM0295_ZC141B3S     ceph6:sdj osd.89
....

Anyone an idea what might be missing in my setup? Is the "LIFE 
EXPECTANCY" perhaps only populated if the local predictor predicts a 
failure or should I find something like "good" there if the disk is ok 
for the moment? Recently I even had a disk that died but I did not see 
anything in ceph-device ls for the died OSD-disk. So I am really unsure 
if failure prediction is working at all on my ceph system?

Thanks
Rainer

--
Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
Web: http://userpages.uni-koblenz.de/~krienke
PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx