Hi all, we need to prepare for temporary shut-downs of a part of our ceph cluster. I have 2 questions: 1) What is the recommended procedure to temporarily shut down a ceph fs quickly? 2) How to avoid MON store log spam overflow (on octopus 15.2.17)? To 1: Currently, I'm thinking about: - fs fail <fs-name> - shut down all MDS daemons - shut down all OSDs in that sub-cluster - shut down MGRs and MONs in that sub-cluster - power servers down - mark out OSDs manually (the number will exceed the MON limit for auto-out) - power up - wait a bit - do I need to mark OSDs in again or will they join automatically after manual out and restart (maybe just temporarily increase the MON limit at end of procedure above)? - fs set <fs_name> joinable true Is this a safe procedure? The documentation calls this a procedure for "Taking the cluster down rapidly for deletion or disaster recovery", neither of the two is our intent. We need to have a fast *reversable* procedure, because an "fs set down true" simply takes too long. There will be ceph fs clients remaining up. Desired behaviour is that client-IO stalls until fs comes back up and then just continues as if nothing had happened. To 2: We will have a sub-cluster down for an extended period of time. There have been cases where such a situation killed MONS due to excessive amount of non-essential logs accumulating in the MON store. Is this still a problem with 15.2.17 and what can I do to reduce this problem? Thanks for any hints/corrections/confirmations! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx