Hi there Recently I got a problem triggered by rebooting ceph nodes, which eventually wound up by rebuilding from ground up. A too-long-don't-read question here is: is there suggested best practices for online/offline ceph node? Following the official ceph doc, I set up a 4 node ceph (firefly) cluster in our lab last week. It consists of 1 admin, 1 mon and 2 osd nodes. All 4 nodes are physical servers running Ubuntu 14.04, no virtual machines used. The mds and the 3rd osd are actually on the monitor node. Everything looked okay by then, 'ceph -w' gave me health_ok and I could observe all available storage capacity. I was happily writing python code using ceph S3 API. This Monday I apt-get upgraded all those 4 machines and then rebooted them. Once all 4 were back online, ceph -w gave some errors about IP address of monitor node and some error messages like "{timestamp} 7fb6456f2500 0 -- {monitor IP address}:0/2924 ...... pipe(0x5516270 sd=97 :0 s=1 pgs=0 cs=0 l=1c=0x5c54e20).fault" (sorry didn't save exact error logs since I've rebuilt the cluster, my mistake :-( ). Due to lab's policy, only DHCP is allowed, so I updated monitor IP address in /etc/ceph/ceph.conf and tried to push config to all nodes but that didn't work. Then I tried to restart ceph service on those nodes, no luck. I even went to ceph-deploy purgedata approach, no luck again. Then I had to purge all, restarted from zero. Again, I'm sorry no error msgs were saved, I was just too frustrated. Now I have a working cluster but I don't think I can afford redo it again. So the question mentioned above: how shall I properly do maintenance work without breaking my ceph cluster? Some procedure or commands I should issue after rebooting? Thanks Br. J Hewitt