Hello,
Yesterday finally I have managed to screw up my installation of ceph! :)
My ceph was at 80% capacity. I have rebooted one of OSDs remotely and
managed to screw up with fstab. Host failed to come up and while I was
driving from home to my office ceph took recovery action. But it meant
that it has filled up another OSDs completely and it has failed. Ceph
continued to recover and killed other OSDs in the same fashion. Not
quite good. Attempt to restart OSDs was in vain: they were unable to
test for xattrs because file system was full and only growing file
system allowed them to restart.
Now this leads me to a question/proposal: is there a feature which
allows ceph to halt recovery process if any of live OSDs exceeding say
95% percent capacity? It is quite distinct from what is considered full
or near full OSD as any writes when OSD is near full or full coming from
clients and inability to write leads to client lock up. But halting
recovery should allow clients to continue even so ceph is in degraded
state. It does not make sense to me to allow ceph go from degraded state
to crashed state when no client needs it.
Regards,
Vladimir
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html