Hi Alex,
The cluster has been idle at the moment being new and all. I
noticed some disk related errors in dmesg but that was about it.
It looked to me for the next 20 - 30 minutes the failure has not
been detected. All osds were up and in and health was OK. OSD logs
had no smoking gun either.
After 30 minutes, I restarted the OSD container and it failed to
start as expected.
if the cluster doesn't have to read or write to specific OSDs (or
sectors on that OSD) the failure won't be detected immediately. We had
an issue last year where one of the SSDs (used for rocksdb and wal)
had a failure, but that was never reported. We discovered that when we
tried to migrate the lvm to a new device and got read errors.
Later on, I performed the same operation during the fio bench mark
and OSD failed immediately.
This confirms our experience, if there's data to read/write on that
disk the failure will be detected.
Please note that this was in a Luminous cluster, I don't know if and
how Nautilus has improved in sensing disk failures.
Regards,
Eugen
Zitat von Alex Litvak <alexander.v.litvak@xxxxxxxxx>:
Hello cephers,
I know that there was similar question posted 5 years ago. However
the answer was inconclusive for me.
I installed a new Nautilus 14.2.1 cluster and started pre-production
testing. I followed RedHat document and simulated a soft disk
failure by
# echo 1 > /sys/block/sdc/device/delete
The cluster has been idle at the moment being new and all. I
noticed some disk related errors in dmesg but that was about it.
It looked to me for the next 20 - 30 minutes the failure has not
been detected. All osds were up and in and health was OK. OSD logs
had no smoking gun either.
After 30 minutes, I restarted the OSD container and it failed to
start as expected.
Later on, I performed the same operation during the fio bench mark
and OSD failed immediately.
My question is: Should the disk problem have been detected quick
enough even on the idle cluster? I thought Nautilus has the means to
sense failure before intensive IO hit the disk.
Am I wrong to expect that?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com