Re: ceph Nautilus: device health management, no infos in: ceph device ls

Florian Pritz <florian.pritz@xxxxxxxxxxxxxx> · Wed, 4 May 2022 10:41:35 +0200

On Fri, Apr 29, 2022 at 10:43:54AM +0200, Rainer Krienke <krienke@xxxxxxxxxxxxxx> wrote:
> # ceph device ls
> DEVICE                            HOST:DEV  DAEMONS LIFE EXPECTANCY
> SEAGATE_ST4000NM017A_WS23WKJ4     ceph4:sdb osd.49
> SEAGATE_ST4000NM0295_ZC13XK9P     ceph6:sdo osd.92
> SEAGATE_ST4000NM0295_ZC141B3S     ceph6:sdj osd.89
> ....
> 
> Anyone an idea what might be missing in my setup? Is the "LIFE EXPECTANCY"
> perhaps only populated if the local predictor predicts a failure or should I
> find something like "good" there if the disk is ok for the moment? Recently
> I even had a disk that died but I did not see anything in ceph-device ls for
> the died OSD-disk. So I am really unsure if failure prediction is working at
> all on my ceph system?

If it is working correctly, you should see a rough time frame (a few
days to >= 6 months) in that column. That said, it used to work on my
cluster when we were using SATA SSDs, but since we switched to NVMe, it
stopped showing anything. You can also check a specific device with
`ceph device predict-life-expectancy SEAGATE_ST4000NM0295_ZC141B3S`, but
you will probably just get "unknown". You can check if ceph has metrics
for your device with `ceph device get-health-metrics
SEAGATE_ST4000NM0295_ZC141B3S`

I'm guessing most devices are simply not supported, but if you find more
information I'd be happy to hear it.

Florian
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx