Hi all,
I recently encountered a situation where some partially removed OSDs
caused my cluster to enter a "stuck inactive" state. The eventually
solution was to tell ceph the OSDs were "lost". Because all the PGs
were replicated elsewhere on the cluster, no data was lost.
Would it make sense or be possible for Ceph to automatically detect
this situation ("stuck inactive" and PGs replicated elsewhere) and
automatically take action to un-stuck the cluster? E.g. automatically
mark the OSD as lost or cause the OSD be down and out to have the same
effect?
Ideally anything that can be safely automated should be. :)
Thanks!
C.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com